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Chapter  1 
Introduction 


In  the  past  15  years  or  so,  Metal-Oxide-Semiconductor  (MOS)  technolo¬ 
gies  have  had  a  major  impact  on  the  computer  field.  The  cflcct  of  MOS 
technologies  has  been  based  on  the  rapidly  increasing  scale  of  integration 
acheiveable;  the  niimber  of  devices  which  can  be  placed  on  a  single  chip 
has  been  doubling  every  few  years  and  will  probably  continue  to  do  so  for 
several  more  years.  MOS  technologies  also  ^ve  the  ability  to  store  charge 
dynamically  for  periods  of  time  long  relative  to  the  device  switching  times. 
The  combination  of  circuit  density  and  dynamic  storage  have  made  possible 
such  developments  as  the  1  transistor  dynamic  RAM,  the  EPROM  and  the 
single-chip  microprocessor.  With  the  exception  the  of  the  dynamic  RAM 
which  has  made  possible  the  multi-megabyte  main  memories  of  large  com¬ 
puter  systems,  the  impact  of  MOS  technological  developments  has  been  felt 
mainly  in  the  lower  performance  end  of  the  cojiiputer  spectrum. 

MOS  hjis  remained  a  low  end  logic  technology  because  bipolar  tech¬ 
nologies  have  always  had  a  significant  speed  advantage  over  MOS.  All  of 
the  present  generation  supercomputers  computers  are  based  on  high  speed 
bipolar  Emitter-Coupled-Logic  (ECL)  families.  These  technologies  are  very 
fast  but  consume  large  amounts  of  power;  this  limits  the  amount  of  cir¬ 
cuitry  which  can  be  placed  on  a  single  chip  to  orders  of  magnitude  less  than 
is  achievable  in  MOS.  Machines  constructed  using  ECL  components  have 
tended  to  be  very  fast  but  also  very  expensive  to  design,  construct,  operate 
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and  maintain. 

As  the  minimum  dimensions  drop  into  the  subinicron  range,  many  of 
the  tradeoffs  arc  changing.  As  technologies  scale,  dimensions  tend  to  scale 
roughly  together  in  all  three  dimensions.  Bipolar  devices  are  constructed 
vertically  and  the  switching  speed  of  the  bipolar  devices  depends  on  the 
width  of  the  base  region.  MOS  devices  are  constructed  horizontally  and  the 
switching  speed  of  MOS  devices  depends  on  the  current  drive  capabilities 
of  the  devices  and  the  capacitive  load  presented  by  the  gates  of  the  devices. 
When  all  dimensions  scale  down  by  a  factor  of  l/r,  the  switching  speed  of  the 
bipolar  devices  will  increase  by  a  factor  of  t  due  to  the  decrease  in  the  bcise 
width;  the  speed  of  the  MOS  devices  will  increase  by  a  factor  or  due  to 
increases  in  the  currents  by  a  factor  of  r  and  decreases  in  the  capacitive  loads 
by  the  same  factor.  As  the  speed  advjmtage  of  ECL  decreases,  the  density 
and  power  advantages  of  MOS  technologies  become  more  important  in  the 
decision  between  the  technologies.  With  the  current  industry  wide  switch 
from  NMOS  technologies  to  Complementary  MOS  (CMOS)  technologies,  the 
power  advantage  of  MOS  has  increased  even  farther. 

As  MOS  technologies  move  into  the  higher  speed  applications,  new  con¬ 
cerns  arc  becoming  important  that  have  not  been  addressed  by  MOS  design¬ 
ers  previously.  Specifically,  in  past  MOS  based  systems  the  clock  frequency 
and  data  rates  have  typically  been  so  slow  that  inter-chip  communications 
have  not  been  a  serious  problem.  As  frequencies  climb  into  the  50 — 100  MHz 
range,  the  delay  associated  with  sending  data  from  one  chip  can  easily  be 
equal  to  or  greater  than  the  clock  period. 

ECL  machine  designers  have  been  addressing  this  problem  for  years  but 
the  change  in  technologies  has  added  new  twists  to  the  problem,  as  well  as 
introducing  the  possibility  of  using  new  approaches  to  solving  the  problem. 

1.1  Synopsis 

Chapter  2  will  discuss  the  constraints  which  the  designer  of  a  high  speed 
computer  must  follow  in  order  to  insure  the  computer  operates  properly. 
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Particular  attention  is  given  to  how  characteristics  of  the  MOS  technolo¬ 
gies  and  assumptions  regarding  the  clocking  strategy  impact  the  constraints. 
Chapter  3  will  introduce  a  circuit  based  synchronization  technique  known 
as  Dynamic  Delay  Adjustment  and  discuss  an  implementation  of  the  tech¬ 
nique  in  a  CMOS  technology.  The  implementation  has  been  fabricated  and 
tested  and  the  results  are  presented  and  discussed.  Appendix  A  presents 
an  analysis  of  the  bistable  latches  in  order  to  predict  the  probability  of  a 
s3mchronization  failure  occurring  in  the  DDA  synchronizer. 


Chapter  2 


Delay  Constraints  in 
High-Speed  Synchronous  VLSI 
Systems 

This  chapter  will  discuss  delay  constraints  in  high  speed  synchronous  sys¬ 
tems  based  on  MOS  VLSI  chips.  Specifically,  the  constraints  involved  with 
transferring  data  between  chips  iu  such  a  system  will  be  of  the  most  inter¬ 
est.  The  architecture  of  the  system  will  be  assumed  to  be  highly  pipelined 
using  a  two  phase  non-overlapping  clock  to  control  the  flow  of  data  between 
pipeline  stages. 

Figure  2.1  illustrates  how  communication  between  chips  will  be  modeled. 
The  XMTR  chip  is  sending  data  to  the  RCVR  chip.  The  output  of  the 
XMTR  is  simply  a  latch  which  is  clocked  by  (f>2  followed  by  a  buffer;  the 
input  to  the  RCVR  consists  of  another  latch,  clocked  by  0i,  which  is  possibly 
preceded  by  an  amplifier.  The  line  connecting  the  two  chips  is  shown  as 
terminated  transmission  line;  the  line  could  also  have  been  modeled  as  a 
simple  capacitor. 

The  constraints  this  type  of  system  imposes  on  the  designer  will  be  the 
subject  of  this  chapter.  Although  the  synchronous  model  used  here  is  the 
most  commonly  used  timing  scheme,  this  scheme  is  used  only  to  provide  a 
framework  for  the  discussions  in  the  chapter.  Other  assumptions  and  timing 
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Figure  2.1:  Modeling  Inter-Chip  Communication 

models  could  just  as  easily  be  used;  in  most  cases  the  problems  discussed 
and  conclusions  drawn  in  this  chapter  would  still  be  valid. 


2.1  CMOS  Latches  Under  Marginal  Switch¬ 
ing  Conditions 

Figure  2.2(a)  illustrates  one  of  the  simplest  types  of  dynamic  latches 
possible  in  CMOS  a  technology,  a  series  combination  of  2  p-type  and  2 
n-type  transistors;  a  slightly  more  complicated  static  latch  is  shown  in  Fig¬ 
ure  2.2(b),  which  uses  2  of  the  dynamic  latches  and  2  cross  coupled  inverters. 
Both  of  the  latches  operate  in  a  manner  which  is  fundamentally  different 
from  the  edge  triggered  latches  commonly  used  in  TTL  and  ECL  macliines; 
these  latches  are  more  accurately  described  as  either  level-triggered.  Eldge- 
triggered  latches  arc  designed  so  that  the  outputs  only  change  during  a  very 
short  time,  known  as  the  hold  time  or  t^,  immediately  following  the  rising 
edge  of  the  clock.  For  latches  like  the  2  CMOS  latches  shown,  the  outputs 
will  respond  to  transitions  on  the  data  input  with  some  small  delay  as  long 
as  ^  is  HIGH.  As  will  be  seen  in  forthcoming  sections,  this  behavior  of  the 
CMOS  latches  has  a  big  effect  on  the  types  of  clocking  schemes  which  can 
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Figure  2.2;  CMOS  Latches,  (a)  A  Simple  Dynamic  Latch  (b)  A  Bistable 
Latch 

be  used  in  MOS  based  systems. 

Both  types  of  latches  have  relatively  short  delays.  For  the  dynamic  latdi 
the  delay  will  be  slightly  longer  than  a  typical  inverter  delay  due  to  the 
two  devices  in  series  in  both  the  pullup  and  the  pulldown  piths;  for  the 
static  latch  the  delay  will  depend  on  the  relative  sizes  of  the  devices  m  the 
inverters  and  in  the  latches;  in  order  for  the  state  of  the  latch  to  switch, 
the  latches  devices  must  be  much  greater  than  the  devices  in  the  inverters. 
Under  nonual  switching  conditions,  data  transitions  occur  far  enough  before 
the  falling  edge  of  the  clock  that  the  outputs  have  settled  to  logically  valid 
levels.  Wlien  data  transitions  happen  very  close  to  the  falling  clock  edge, 
the  outputs  may  not  have  sufficient  time  to  switch  completely.  Figure  2.3 
show  the  responses  of  these  two  latches  to  data  transitions  which  fall  very 
close  to  the  clock  edge.  Both  latches  are  assumed  to  drive  some  moderate 
load  capacitance  consisting  of  other  logic  gates. 

The  dynamic  latch’s  output  fails  to  switch  completely  before  the  clock 
signal  turns  off  the  two  middle  devices  in  the  stack.  Once  these  devices 
are  off,  the  outpiit  impedance  of  the  latch  becomes  very  high;  this  high 
impedance  prevents  the  capacitance  loading  the  output  from  either  accumu¬ 
lating  or  losing  charge.  The  output  of  the  latch  will  stay  at  the  intermediate 
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value  until  <l>  goes  HIGH  again  on  the  next  cycle. 

The  situation  is  different  for  the  bistable  latch;  the  clock  signal  going 
LOW  before  the  outputs  arc  stable  stops  the  output  transitions  just  before 
the  two  output  voltages  arc  equal.  However,  the  feedback  present  in  the 
latch  ainplilics  the  small  differential  voltage  and  forces  the  outputs  back  to 
legal  logical  values.  Appendix  A  will  consider  the  bistable  latch  in  much 
greater  detail. 

The  job  of  the  system  designer  is  to  insiire  that  failures  such  as  these  do 
not  occur  very  often.  In  order  to  see  just  how  low  the  probability  of  failure 
must  be  in  order  to  insure  reliable  system  operation,  consider  the  following 
example  of  a  high  performance  multi-processor.  Assume  there  are  10,000 
processors  each  of  which  has  a  single,  serial  port  which  ruiis  at  lOOMHz;  the 
switch  network  required  to  interconnect  this  many  processors  would  probably 
require  around  5  stages  of  switching  elements  with  each  stage  having  on  the 
order  of  10,000  inputs  which  must  be  latched  on  every  cycle.  Assuming  the 
computer  operates  24  hours  a  day  for  250  days  a  year^;  in  order  to  have 
one  latch  failure  a  year  on  the  average,  the  probability  of  one  of  the  latches 
failing  on  any  particular  cycle  would  have  to  be  less  than  10“*®. 


2.2  Specification  of  MOS  Logic  Families 

The  previous  section  illustrated  one  type  of  behavior  that  can  result 
from  improperly  latching  signals  in  MOS  systems.  In  order  to  avoid  such 
situations,  designers  must  be  able  to  predict  the  performance  of  the  MOS 
components  before  they  arc  fabricated.  This  section  will  discuss  how  the 
specifications  required  by  a  digital  system  can  be  mapped  onto  a  technology 
like  CMOS.  The  behavior  of  the  most  interest  here  is  the  dynamic  behavior 
so  only  a  brief  discussion  of  the  static  characterizations  will  be  given. 


‘Leaving  almost  i/3  of  the  year  unused. 
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Figure  2.4:  An  Inverter’s  Static  Transfer  Characteristics  and  the  Corr^ 
spending  Logic  Families  Level  Specifications 

2.2.1  Static  Specification  of  MOS  Logic  Families 

Figure  2.4  illustrates  the  static  electrical  specifications  for  a  hypothetical 
MOS  logic  family;  the  specifications  consist  of  separate  HIGH  and  LOW 
voltage  levels  for  inputs,  V/^,  Vn,  and  outputs,  Vqh  and  VbL[l2].  Any 
logic  gate  in  this  family  must  obey  the  requirement  that  if  all  of  the  gate’s 
inputs  are  either  above  Vju  or  below  Vn  then  the  gate’s  output  must  be 
either  above  Vqh  nr  below  Vqi.  The  shaded  portion  of  the  figure  shows  the 
portion  of  the  Vin  vs.  Vout  plane  in  which  valid  logic  signals  must  fall. 

Every  logic  gate  will  have  a  unique  transfer  function  which  characterizes 
the  output  voltage  as  a  function  of  the  input  voltage(s).  Figure  2.4  also 
shows  the  transfer  function  for  an  inverter  whose  static  transfer  function 


CHAPTER  2.  DELAY  CONSTRAINTS 


18 


Vout  =  meets  the  requirements  of  the  logic  family;  that  is: 


>  ^OH 

(2.1) 

<  VOL 

(2.2) 

One  point  that  is  immediately  obvious  is  the  fact  that  this  emulation  of 
a  bintiry  system  is  basically  a  failure  in  the  sense  that  there  are  states  in 
the  technology  which  are  logically  undefined;  this  is  not  the  case  for  a  truly 
binary  system.  It  is  exactly  this  failure  which  causes  the  kinds  of  problems 
seen  in  Figure  2.3;  in  a  truly  binary  system,  latching  an  illegal  output  state 
would  be  impossible.  Unfortunately  it  is  impossible  to  build  a  truly  binary 
logic  system  output  of  the  continuous  mediums  available. 

2.2.2  Dynamic  Specification  of  MOS  Logic  Families 

Just  as  the  logic  levels  were  directly  related  to  the  static  electrical  char¬ 
acteristics  of  the  logic  gates  comprising  a  family  of  digital  logic,  it  should 
be  possible  to  characterize  the  dynamic  performance  of  the  same  gates.  An 
important  concept  in  the  characterization  presented  here  will  the  use  of  char¬ 
acteristic  waveforms;  assuming  that  all  signals  of  interest  have  some  typical 
waveshape  will  simplify  the  definition  of  the  delay  between  signals.  A  stan¬ 
dard  method  for  determining  the  delay  between  the  data  and  control  inpiits 
to  a  gate  and  the  resulting  data  outputs  is  necessary  for  characterizing  the 
dynamic  performance  of  a  MOS  logic  family. 

Characteristic  Waveforms 

In  MOS  technologies  logic  gates  drive  loads  which  are  almost  purely  ca¬ 
pacitive.  The  major  components  of  these  loads  arc  the  gate  cind  Miller  capac¬ 
itances  of  the  MOS  transistors  being  driven  and  the  parasitic  capacitances 
due  to  the  diffusion,  polysilicon  and  metal  used  to  interconnect  the  devices. 
When  long  lines  of  polysilicon  or  metal  interconnect  must  be  driven,  there 
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can  also  be  signlGcant  resistive  and  possibly  even  inductive  components  in 
the  load  but  these  cases  are  not  typical. 

The  nonlinear  nature  of  the  devices  and  the  gate  capacitances  make  it 
difficult  to  generate  closed  form  descriptions  of  the  waveforms.  However, 
by  making  a  few  simplifying  assumptions  it  is  possible  to  accurately  predict 
the  responses  of  MOS  logic  gates  to  simple  inputs  like  steps  and  ramps 
[18]  [14].  The  exact  form  of  the  predicted  responses  will  depend  on  the 
assumptions  made  and  the  definition  of  delay  used.  In  any  case  however,  the 
delay  will  be  almost  directly  proportional  to  the  electron  and  hole  mobihties, 
the  power  supply  voltage  and  the  fanout  ratio  of  the  circuit.  Other  important 
parameters  are  the  transistor  thresholds  and  the  ratio  of  the  n*type  transistor 
widths  to  the  p*type  transistor  widths  [18]. 

Deriving  expressions  for  the  response  of  single  gates  to  simple  input  wave¬ 
forms  is  helpful  but  the  response  of  several  gates  cascaded  together  is  much 
more  interesting  and  useful  to  the  circuit  designer.  It  is  more  difficult  to 
express  the  response  of  cascaded  gates  in  any  analytical  form,  but  not  im¬ 
possible  [14].  Figure  2.5  shows  the  response  of  a  string  of  inverters  to  a  step 
input.  The  device  sizes  have  been  chosen  to  have  the  fanout  of  every  stage 
approximately  equal. 

It  can  be  seen  that  after  only  a  couple  of  stages  the  waveforms  begin 
looking  almost  identical.  This  is  a  common  characteristic  of  MOS  circuits 
and  makes  it  possible  to  discuss  delays  in  terms  of  characteristic  waveforms 
[2]  rather  than  trying  to  characterize  signals  in  terms  of  steps  and  ramps. 
The  delay  between  two  edges  is  defined  as  the  time  between  the  Vm/2  points 
on  each  edge.  This  definition  is  not  dependent  on  rise  and  fall  times  or  even 
whether  the  edges  are  rising  or  falling. 

This  definition  of  delay  between  signals  can  be  extended  to  define  the 
phase  margin,  tpM,  of  a  clocked  latch  to  be  the  delay  between  data  and 
clock  inputs  to  the  latch.  The  next  section  will  examine  the  relationship 
between  tpM  for  a  latch  and  the  resulting  performance  of  the  latch. 
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Tine  (secs) 

Figure  2.5:  Tlie  Step  Response  of  a  String  of  CMOS  Inverters 

Dynamic  Transfer  Functions  of  CMOS  Latches 

Section  2.1  illustrated  the  response  of  both  dynamic  and  static  CMOS 
latches  to  nearly  simultaneous  data  and  clock  transitions.  This  section  will 
consider  in  more  detail  how  the  response  of  tlic  dynamic  latch  depends  on 
the  phase  margin  of  the  data  input  with  respect  to  the  clock.  Similar  char¬ 
acteristics  could  be  studied  for  the  static  latcli  as  well.  Appendix  A  analyzes 
the  static  latch  in  a  slightly  different  manner;  that  analysis  will  be  used  to 
extend  this  section’s  observations  to  static  latches  in  a  qualitative  sense. 

Figure  2.3(a)  showed  the  response  of  a  dynamic  CMOS  latch  to  a  data 
transition  jxist  before  the  falling  clock  edge.  For  this  particular  arrangement 
of  data  and  clock  edges,  the  output  is  latched  just  above  Vjj/2  and  does  not 
change  until  the  clock  goes  HIGH  agaui.  Had  the  data  transition  occurred 
one  nanosecond  earlier,  the  output  would  have  switched  completely;  a  one 
nanosecond  later  transition  would  not  have  caused  any  movement  of  the 
output  until  the  next  clock  cycle. 

By  simulating  the  latch  for  data  transition  times  ranging  over  the  2ns 
interval  just  mentioned,  the  voltage  at  which  the  out])ut  is  latclicd  can  be 
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determined  as  a  function  of  the  phase  margin  of  the  data  si^al.  Figure  2.6 
shows  the  results  of  such  a  scries  of  simulations;  the  figure  plots  the  latched 
output  voltage,  Viauht  versus  the  phase  margin,  tpM- 

This  plot  defines  the  dynamic  transfer  function  for  the  latch, 
this  transfer  function  is  similar  to  the  more  familiar  static  one,  Hs{Vin), 
shown  in  Figure  2.4  in  that  it  provides  a  one-to-one  mapping  of  on  input 
characteristic  to  an  output  voltage.  The  static  mapping  was  used  to  generate 
the  specifications  for  the  static  logic  level  definitions;  this  dynamic  transfer 
function  will  be  used  to  define  delay  constraint  specifications,  i.e.  setup  and 
hold  times.  Section  2.5  will  uses  both  sets  of  specifications  to  discuss  how 
the  relative  noise  inuinmity  of  a  latch  relates  to  the  specifications. 

There  is  some  small  window  of  phase  margins  which  will  result  in  a 
voltage  being  latched  which  violates  the  static  logic  voltage  level  specified 
for  the  technology.  Due  to  the  definition  of  delay,  one  of  the  boundaries  of 
this  window  may  be  a  negative  phase  margin,  i.e.  the  data  transition  occurs 
slightly  after  the  clock  transition.  By  convention,  this  boundary  will  be 
called  the  hold  time  of  the  latch,  The  positive  boundary  of  the  window 
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will  be  called  the  setup  time  of  the  latch,  also  by  convention.  The  setup 
land  hold  times  must  obey  the  following  rules  in  order  to  insure  the  latch 
does  not  violate  the  static  logic  level  specifications': 

Viauhih)  =  Holts)  >  Vqh  (2.3) 

and 

Vlateh{tH)  =  HoitH)  <  VoL-  (2.4) 

As  long  as  the  designer  insures  that  the  data  and  clock  transitions  always 
obey  these  setup  and  hold  time  specifications,  the  latch  will  never  latch  a 
logically  undefined  voltage. 

Appendix  A  analyzes  the  behavior  of  the  bistable  latch  in  a  slightly  dif¬ 
ferent  manner;  the  conclusions  drawn  in  the  Appendix  will  be  used  to  extend 
the  discussion  of  this  Section  to  bistable  latches.  The  first  conclusion  the 
Appendix  draws  is  that  any  non-zero  initial  differential  voltage  between  the 
static  latch’s  outputs  will  eventually  be  amplified  until  the  outputs  reach 
logically  defined  states.  Secondly,  if  the  phase  margin  is  completely  arbi¬ 
trary,  then  for  any  amount  of  time  the  latch  is  given  to  settle  there  is  a 
probability  that  the  mctastable  state  will  persist  longer  than  the  settling 
tune;  the  probability  decays  exponentially  with  increasing  settling  time. 

Assuming  that  the  system  is  using  2-phase  non-overlapping  clocks  and 
one  phase  is  clocking  the  input  to  the  static  latch  while  the  other  phase  is 
sampling  the  outputs,  the  settling  time  of  the  latch  would  be  Tc/2  seconds, 
where  Tc  is  the  clock  period.  Simulations  could  once  again  be  used  to 
generate  an  Ho{to)  transfer  function  for  the  static  latcli  by  taking  Viatek  to 
be  one  of  the  output  voltages  Tc/2  seconds  after  the  first  clock  edge.  Such  a 
function  would  look  similar  to  Figure  2.6  in  most  respects  but  the  switching 
portion  of  the  curve  would  be  much  sharper  resulting  in  a  much  smaller 
difference  between  1$  and  tjj  for  the  static  latch  than  for  the  dynamic  latch. 
In  situations  where  the  phase  margin  of  the  input  can  not  be  well  controlled, 
the  static  latch  provides  much  lower  failure  probabilities  than  the  dynamic 
latch. 

^Assuming  a  rifling  output  tioiiiutlon. 
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2.3  The  Synchronous  Operation  Model 

The  previous  section  defined  setup  and  hold  time  specifications  for  clock¬ 
ed  latches  which,  when  obeyed,  insure  that  the  latches’  outputs  will  always 
logically  valid.  How  the  designer  approaches  insuring  that  these  specifica¬ 
tions  are  not  violated  depends  on  how  the  operation  of  the  computer  is  be¬ 
ing  modeled.  The  most  commonly  used  operating  model  is  the  Synchronous 
Model;  the  most  basic  assumption  of  the  synchronous  model  is  that  all  clock 
inputs  transition  simultaneously  everywhere  in  the  machine. 

In  a  highly  pipelined  machine,  all  signals  in  the  machine  propagate  be¬ 
tween  latches;  and  presumably  at  some  point  the  phase  margin  of  the  signal 
to  one  of  the  clock  signals  is  known.  For  instance,  if  the  input  to  a  latch 
is  known  to  only  change  when  the  latches  clock  is  LOW,  the  the  output  of 
the  latch  would  always  have  a  fixed  phase  margin  with  respect  to  the  clock. 
By  assuming  skewless  clock  signals,  the  phase  margin  of  any  signal  can  be 
found  by  calculating  the  delays  the  signal  experiences  between  the  output 
of  a  latch  at  which  the  phase  margin  is  known  to  the  input  of  the  latch  at 
which  the  phase  margin  is  needed. 

The  setup  and  hold  characteristics  of  the  latches  limit  the  delays  which 
are  allowable  on  any  particular  path.  These  limits  can  be  expressed  as 
bounds  the  signal  delays  di  which  can  be  expressed  as  the  sum  of  a  logic 
delay  dn  and  a  wire  delay  ci,iy.  The  clock  frequency  Tc  and  the  setup  and 
hold  times  will  combine  to  provide  the  designer  with  bounds  on  legal  values 
of  di. 

A  simple  synchronous  model  assumes  the  minimum  delay  of  a  stage  of 
logic  is  greater  than  the  required  hold  time  but  would  require  all  delays  be 
less  than  one  clock  period  minus  the  setup  time: 


tn  ^  =  diL  +  <Tc  —  ts  (2.5) 

Another  way  of  interpreting  this  constraint  is  that  the  clock  period  must  be 
longer  than  the  longest  delay  in  the  entire  machine. 

A  more  general  synchronous  model  relaxes  this  single  clock  cycle  require- 
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ment.  Delays  may  be  greater  than  Tc  but  tlic  setup  and  hold  requirements 
must  still  be  followed.  Equations  2.5  may  be  generalized  by  adding  a  mul¬ 
tiple  of  Tc  to  each  of  the  boundaries.  In  the  following  constraint  /c  is  an 
integer  greater  than  or  equal  to  0: 

kTc  +  ^  dj  <  (fc  Y)Tc  ~  ts  (2*6) 

Given  a  well  understood  logic  technology  determining  logic  delays  may 
not  involve  much  more  than  counting  the  number  of  stages  of  logic.  De- 
tenuining  the  wire  delays  requires  a  very  thorough  understanding  of  the 
physical  organization  of  the  final  machine.  The  length  of  the  longest  wire 
delays  with  respect  to  the  logic  delays  determines  whether  the  two  sided 
constraints  are  necessary. 

If  the  one  sided  constraints  arc  used,  the  clock  period  will  be  specified 
by  the  longest  total  delay;  since  the  wire  delays  will  vziry  from  path  to  path, 
the  designer  has  some  flexibility  to  include  more  logic  in  those  paths  which 
happen  to  have  shorter  wires.  However,  the  architecture  of  the  computer 
will  specify  when  operations  can  be  performed,  limiting  this  flexibility.  Any 
portion  of  the  clock  cycle  which  can  not  be  effectively  used  will  be  wasted 
and  will  lower  the  efficiency  of  the  machine.  The  two  sided  constraints  may 
make  it  possible  for  the  designer  to  increase  the  fraction  of  the  paths  which 
effectively  use  the  entire  clock  cycle;  however,  this  incre<ise  in  efficiency  comes 
at  a  large  expense  in  design  complexity. 

Clock  Skew  and  Other  Overheads  in  Synchronous  Systems 

One  of  the  major  assumptions  behind  the  delay  constraints  2.5  and  2.6 
was  that  clock  transitions  occur  simultaneously  in  all  parts  of  the  computer. 
In  order  to  make  this  assumption  hold,  the  clock  distribution  system  must 
be  carefully  designed  to  insure  each  path  from  the  clock  source  to  each  chip 
or  group  of  chips  has  the  same  delay. 

Matching  the  wire  lengths  along  the  clock  distribution  paths  is  feasible. 
The  main  source  of  variation  in  clock  delays  will  be  in  the  level  converters 


'Si 


A 


CHAPTER  2.  DELAY  CONSTRAINTS  25 

and  bufTcrs  which  will  generally  be  needed  to  provide  clock  signals  with 
sufficient  drive  at  the  chips.  These  converters  and  bufTers  may  be  placed  on 
each  individual  chips  or  they  may  be  placed  externally  and  shared  among  a 
group  of  chips.  In  either  case,  process  variations,  thermal  fluctuations  and 
load  variations  will  all  be  sources  of  delay  mismatch  leading  to  skewing  of 
the  clock  transitions  from  chip  to  chip. 

Clock  skew  due  to  controllable  characteristics  like  wire  lengths  can  be 
handled  by  the  designer  at  the  expense  of  further  complexity  in  the  delay 
constraints.  Skew  arising  from  thermal  and  process  fluctuations  is  unpre¬ 
dictable  and  therefore  the  designer  must  csthnate  the  worst  case  skew  and 
allow  extra  time  in  the  clock  period  to  allow  for  the  skew.  This  is  pure  over¬ 
head  which  lengthens  the  clock  period  without  adding  extra  performance. 


2.4  Delays:  Types  and  Scaling 
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In  order  to  prevent  latch  failures  from  plaguing  a  computer  system,  the 
designer  must  insure  that  no  delay  paths  violate  the  setup  and  hold  require¬ 
ments  of  the  latches  used  to  latch  the  data  inputs.  In  a  VLSI  system,  this 
can  involve  working  with  both  on-chip  emd  off-chip  delays  and  signals.  This 
thesis  is  mainly  concerned  with  inter-chip  commimication  so  the  discussion 
here  will  concern  primarily  delays  involving  inter-chip  signals.  Section  2.4.1 
will  touch  briefly  on  the  use  of  equipotential  regions  to  help  simplify  the 
delay  constraints  for  on-chip  signals. 

This  section  will  treat  delays  as  if  the  actual  delay  values  can  be  predicted 
accurately  at  design  time.  Due  to  manufacturing  variations  the  actual  delays 
present  after  the  system  is  built  may  vary  from  the  predicted  delays;  the 
designer  must  make  the  design  robust  enough  to  allow  for  the  variations. 
The  sources  of  delay  variations  and  the  effects  of  the  variations  on  systems 
will  be  discussed  along  w.'^.h  other  noise  issues  in  Section  2.5. 
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2.4.1  Equipotential  Regions 

In  MOS  VLSI  circuits,  most  of  the  ou-cliip  wires  used  to  interconnect 
devices  and  gates  are  short  and  can  be  accurately  modeled  as  purely  ca¬ 
pacitive  loads.  However,  on  most  chips  there  will  be  long  lines  of  metal  or 
polysilicon  needed  to  connect  distant  portions  of  the  chip  together.  These 
long  lines  may  have  a  significant  resistive  component  which  will  cause  the 
line  to  appear  more  as  a  distributed  RC  line;  signals  tend  to  diffuse  down 
such  a  line.  There  can  be  a  noticeable  skew  between  signals  at  opposite  ends 
of  the  line.  The  circuit  designer  must  therefore  treat  long  lines  differently 
when  trying  to  account  for  delays  in  the  circuit. 

Whether  a  particular  wire  can  be  treated  as  a  lumped  capacitive  load  or 
not  depends  on  how  significant  the  delay  associated  with  the  wire  is  com¬ 
pared  to  the  intrinsic  logic  delays  in  the  associated  circuitry.  This  classifica¬ 
tion  has  lead  to  the  practise  of  dividing  VLSI  cliips  and  systems  into  smaller 
units  called  equipotential  regions.  These  regions  are  chosen  small  enough 
that  wires  contained  within  a  region  can  be  treated  as  lumped  capacitances 
rather  than  diffusion  lines.  This  simplifies  the  delay  constraints  within  the 
region. 

How  large  or  small  an  equipotential  region  should  be  is  open  for  debate. 
Charles  Seitz  uses  the  definition  that  the  maximum  size  for  an  equipotential 
region  is  that  it  be  small  enough  that  the  potential  on  any  wire  within  this 
area  will  equalize  in  less  than  t  where  r  is  the  delay  of  one  inverter  driving 
another  inverter  of  the  same  size[30]. 

How  much  of  a  chip  can  be  considered  to  be  an  equipotential  region 
would  be  important  to  the  chip  designer,  but  for  the  system  designer,  the 
important  point  is  that  signals  which  pass  between  two  chips  will  almost 
always  cross  the  boundaries  of  an  equipotential  region.  Delays  on  every  wire 
between  chips  may  have  to  be  considered  to  insure  proper  operation. 


2.4.2  Types  of  Delays 

The  main  sources  of  delay  which  affect  inter-chip  communications  in  high 
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speed  systems  ore; 

•  Driving  large  ofT-cliip  loads, 

•  Sensing  small  voltage  swings, 


•  Transmission  line  delays. 

Driving  Off- Chip  Loads 

Driving  olT-chip  loads  Ccan  lead  to  significant  delays  due  to  the  disparity 
between  on-chip  and  off-chip  dimensions.  There  are  two  ways  in  which  off- 
chip  loads  can  be  modeled  when  designing  buffer  circuits,  either  as  a  pure 
capacitive  load  or  as  a  transmission  line  with  some  characteristic  impedance. 

The  strategics  needed  to  drive  each  kind  of  load  are  slightly  different  but 
result  in  similar  circuitry  and  delays.  Figure  2.7  show  buffers  designed  to 
drive  capacitive  loads  and  transmission  lines  including  the  parasitic  capac¬ 
itances  and  inductances  which  are  due  to  the  packaging  technology  used. 
The  parasitics  consist  mainly  of  an  on-chip  capacitance,  an  inductor  due  to 
the  bonding  wire  and  package  leads  and  an  off-chip  capacitance  due  to  the 
package  and.  pc-board  interconnect.  The  capacitive  load  simply  adds  a  large 
capacitor  in  parallel  with  the  off-chip  capacitance  while  the  transmission  line 
adds  a  transmission  line  in  parallel  with  the  off-chip  capacitor. 

The  device  sizes  used  in  the  final  stage  of  the  buffer  will  be  determined  by 
cither  the  value  of  the  load  capacitance  or  the  transmission  line’s  impedance. 
In  MOS  technologies,  the  delay  of  any  gate  is  roughly  proportional  to  the 
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fanout  the  gate  must  drive;  a  fanout  of  n  will  result  in  a  gate  which  is  n 
times  slower  than  a  gate  with  a  fanout  of  1.  In  order  to  have  a  fast  rise  time, 
the  fanout  seen  by  any  of  the  inverters  must  be  fairly  small;  the  last  inverter 
for  the  capacitive  load  will  have  to  be  very  large.  For  the  transmission 
line  buffer,  the  last  pulldown  transistor  must  have  a  turned  on  resistance 
much  less  than  the  impedance  of  the  transmission  line,  on  the  order  of  tens 
of  Ohms  or  less.  The  large  drive  mverter  and  pulldown  must  in  turn  be 
driven  quickly  which  requires  transistors  only  a  few  times  smaller  than  the 
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Figure  2.7:  MOS  Buffers  Designed  to  Drive  Off-Chip  Loads  (a)  Capacitive 
Loads,  (b)  Transmission  Line  Loads 

final  drive  devices.  This  reverse  scaling  continues  until  the  resulting  devices 
can  be  driven  quickly  by  normal  size  devices;  typically  this  type  of  scaling 
requires  around  d  stages  of  inverters. 

This  buffering  technique  results  in  a  buffer  whicli  heis  very  fast  rise  times 
but  a  significant  delay  due  to  the  latency  required  for  the  signals  to  propagate 
through  the  4  or  so  stages  of  inverters.  It  can  be  shown  that  for  capacitive 
loads  the  minimum  delay  required  to  drive  a  large  load,  Cl,  is  proportional 
to  In  y  where  Y  is  the  ratio  of  Cl  to  the  gate  capacitance  of  a  minimum 
size  inverter [24];  a  shnilar  result  should  hold  for  driving  transmission  lines 
loads  also.  Because  on-chip  loads  are  scaling  down  much  faster  than  off-chip 
loads,  Y  is  increasing.  As  device  sizes  scale  down,  the  minimum  absolute 
delay  required  to  drive  off-chip  loads  is  decreasing  but  the  relative  delay  is 
increasing. 

The  latency  does  not  have  to  be  the  deciding  factor  in  setting  the  fre¬ 
quency  at  which  data  is  sent  through  the  pad  however;  the  input  to  the  buffer 
can  be  clocked  as  quickly  as  the  delay  through  a  single  stage  of  the  buffer 
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will  cdlow.  If  tlic  input  is  changed  while  there  is  still  a  tr.ansition  propagating 
through  the  buffer,  the  buffer  will  act  like  a  delay  line  and  both  transitions 
will  appear  at  the  output  with  approximately  the  same  pheise  relationship 
as  they  had  at  the  input  to  the  buffer.  This  corresponds  to  choosing  the 
two-sided  delay  constraints. 

Sensing  Small  Voltage  Swings 

A  power  dissipation-delay  tradeoff  is  required  when  choosing  the  voltage 
swings  for  off-cliip  signals.  Provided  the  signals  have  short  rise  times,  the 
power  required  to  drive  the  off-chip  capacitive  loads  is  almost  all  ac  power; 
as  such  the  power  is  proportional  to  CVj,  where  Vs  is  the  maximum  signal 
swing.  The  power  required  to  drive  the  transmission  lines  is  mainly  due  to 
the  current  required  to  pulldown  the  terminating  resistor;  this  introduces  a 
Vg/Zo  component  to  the  power.  In  both  cases,  the  power  increases  propor¬ 
tional  to  the  square  of  the  voltage  swing  of  the  interchip  signals’;  this  gives  a 
strong  incentive  for  lowering  the  signal  swing.  Due  to  gain-bandwidth  limi¬ 
tations  inherent  in  active  devices,  lower  signal  swings  will  require  longer  to 
sense.  Luckily,  due  to  the  Vj  scaling  of  the  power  dissipation,  large  savings 
can  be  achieved  without  seriously  impacting  the  clock  frequency. 

Transmission  Line  Delays 

The  other  major  component  of  inter-chip  delay  is  the  delay  due  to  the 
pc-board  traces  and  inter-board  wires.  Treating  wires  and  traces  of  more 
than  a  foot  or  so  in  length  as  capacitive  loads  is  probably  not  very  feasi¬ 
ble;  in  order  to  keep  buffer  sizes  and  delays  reasonable  and  independent  of 
the  actual  loads,  transmission  line  loads  are  necessary.  Signals  propagate 
through  these  transmission  lines  at  far  less  than  specd-of-light  constraints; 
typical  delays  are  on  the  order  of  1.5-2  nanoseconds  per  foot.  Even  sig¬ 
nals  which  stay  on  a  single  pc  board  can  easily  have  inter-chip  propagation 
times  of  3-4  nanoseconds;  inter-board  signals  obviously  can  have  much  longer 
transmission  times. 


CHAPTER  2.  DELAY  CONSTRAINTS 


30 


2.5  Noise  Considerations 

The  term  noise  is  generally  used  to  refer  to  signal  perturbations  which 
displace  a  voltage  at  a  node  from  the  node’s  expected  value.  Offsets  in  a 
node’s  voltage  can  change  the  delay  associated  with  changing  the  logical  state 
of  the  node;  in  this  way,  most  noise  signals  can  be  considered  to  introduce 
some  distortion  of  the  delays  present  as  well  as  the  voltage  levels  present. 
This  delay  noise  or  phase  noise  is  of  more  interest  in  this  thesis  than  voltage 
offsets  but  voltage  noise  is  usually  easier  to  think  about  and  to  quantify 
accurately.  The  discussions  of  noise  topics  which  occur  at  various  points  will 
use  occasionally  both  types  of  noise  under  the  assumption  that  there  really 
is  no  difference  between  the  noises  other  than  in  the  interpretation  of  the 
manifestation  of  the  distortions. 

2.5.1  Sources  of  Noise 

Noise  can  be  grouped  into  two  main  classes:  Transient  and  Steady  State. 
Transient  Noise  Sources 

Fast  switching  times  for  output  signals  require  large  current  surges  to 
charge  the  large  external  capacitances;  the  bonding  wires  and  package  leads 
act  as  inductors  through  which  this  current  must  flow  resulting  in  sig¬ 
nificant  voltage  drops.  Switching  severed  outputs  simultaneously  can  re¬ 
sult  in  noticeable  noise  on  the  power  rails  due  large  power  supply  current 
surges.  Shnileurly,  the  current  required  to  switch  internal  nodes  must  often 
flow  through  polysilicon  or  diffusion  interconnect  causing  noticeable  voltage 
drops.  Closely  spaced  parallel  wires  and  wires  which  cross  form  coupling  ca¬ 
pacitances;  when  one  wire  switches  states,  the  other  wire  will  tend  to  move 
also  especially  if  the  second  wire  is  in  a  high  impedance  state.  Since  the 
largest  capacitances  couple  nodes  to  the  substrate,  precharging  large  arrays 
can  move  the  substrate  voltage  due  to  coupling  to  the  substrate.  Alpha 
particles  can  induce  large  charge  fluctuations  which  can  temporarily  or  per- 
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monently  change  the  states  of  nodes.  Temperature  variations  can  arise  due 
to  sudden  changes  in  the  type  computation  being  preformed  or  the  amount 
of  communication  which  is  occurring.  Temperature  induced  variations  will 
occur  much  slower  than  the  other  transient  noise  perturbations.  Differences 
between  frequency  generators  can  result  in  the  injection  of  a  continuous 
phase  skew  between  signals  which  are  controlled  by  the  clocks. 

Steady  State  Noise  Sources 

Processing  variations  result  in  device  characteristic  shifts  which  are  equi¬ 
valent  to  steady  state  voltage  shifts.  Steady- sttate  noise  voltages  also  result 
from  operating  temperature  variations  and  manufacturing  variations  in  dis¬ 
crete  components  and  pc  board  characteristics  which  cause  power  supply 
offsets  and  voltage  level  variations  for  signals  which  depend  on  resistor  di¬ 
viders. 

So  far  all  of  these  noise  sources  have  been  discussed  as  voltage  variations; 
it  is  also  possible  to  model  noise  sources  as  causing  phase  variations.  Given 
the  definition  of  phase  margins,  any  perturbation  which  delays  an  edge  or 
changes  the  edge’s  rise  or  fall  time  will  cause  a  shift  in  the  phase  margin  of 
the  signal  with  respect  to  other  signals.  Modeling  noise  as  phase  variations 
is  much  more  useful  for  evaluating  the  issues  important  in  high-speed  data 
communication  problems  than  modeling  noise  as  voltage  variations. 

2.5.2  Probabilistic  Model  for  Noise 

Many  of  the  noise  sources  just  described  can  be  modeled  accurately  but 
the  complexity  of  such  modeling  makes  a  simple  model  desirable’.  A  simple 
probabilistic  model  is  proposed  in  [14]  which  will  be  utilized  here. 

By  studying  a  logic  family  and  systems  employing  the  family  over  a 
period  of  time^,  the  noise  in  the  system  can  be  characterized  by  a  plot 

’Dynamic  RAM  designers  routinely  model  all  of  the  types  of  noise  described  above  but  at 
a  great  expense  ui  design  and  simulation  time. 

’Or  more  realistically  by  performing  a  very  careful  analysis  of  the  nature  of  the  noise  sources 
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Noise  Voltage 


Figure  2.8:  Probability  vs.  Noise  Voltage 

such  as  the  hypothetical  one  shown  in  Figure ’2.8.  Any  given  value  of  noise 
voltage,  v„.,  or  phase  noise,  will  occur  with  some  probabilities,  Pn^^m) 
and 

Given  such  probability  functions  and  the  desired  probability  of  noise- 
induced  failure,  the  designer  can  determine  what  worst-case  noises  any  node 
must  be  able  to  tolerate,  WnwAx  designer  must  then  insure 

that  all  of  the  logic  gates’  static  and  dynamic  characteristics  provide  voltage 
and  phase  margins  greater  than  the  worst  case  noises.  These  worst  case  noise 
specifications  will  be  used  to  simplify  all  subsequent  discussions  involving 
noise  considerations. 

2.5.3  Static  Noise  Immunity — Static  Noise  Margins 

The  independent  specifications  for  a  logic  family’s  input  and  output  volt- 


once. 


CHAPTER  2.  DELAY  CONSTRAINTS 


33 


ages,  provide  a  convenient  mechanism  for  defining  the  relative  noise  inuuu- 
nity  of  the  logic  family.  The  maximnin  amounts  of  noise  the  gates  in  a  family 
can  tolerate  on  HIGH  and  LOW  signals  are  given  by  the  noise  margins,  VffMH 
and  VfiML  respectively [12].  These  margins  are: 


VtfMll  =  VoM  -  Vjn 

(2.7) 

VnML  =  VlL  —  VoL‘ 

(2.8) 

As  long  as  the  noise  margins  arc  greater  than  the  worst  case  noise,  Vnmh  > 
Vn^.  and  VffML  >  then  the  probability  of  the  circuit  failing  will  be 
acceptiibly  low. 

The  specification  of  the  logic  levels  and  noise  margins  is  fairly  arbitrary 
and  even  the  choice  of  Vn^  is  somewhat  arbitrary.  This  does  not  mean  that 
these  will  not  affect  the  performance  of  the  logic  gates  designed  to  meet 
the  specifications;  it  can  be  shown  that  the  design  of  restoring  logic  circuits 
involves  a  fundamental  tradeoff  between  noise  margin  and  the  delay  of  the 
circuit  [14].  Due  to  this  coupling  of  the  circuit’s  noise  margins  and  delay,  the 
choices  of  logic  levels  would  be  much  more  tightly  linked  to  the  maximum 
allowable  noise  levels  than  is  indicated  in  this  presentation. 

2.5.4  Phase  Jitter  Immunity — Dynamic  Phase  Mar¬ 
gins 

The  static  transfer  function  given,  Vout  =  ^s(v,n),  is  a  voltage-to-voltage 
relationship;  as  such  it  is  possible  to  check  the  consistency  of  the  specifica* 
tions  by  interpreting  the  output  voltages  as  mput  voltages  and  insuring  the 
that  resulting  output  voltages  would  be  meet  the  specifications  for  output 
voltages.  The  noise  margins  are  exactly  the  amount  the  output  voltages 
could  be  degraded  and  still  produce  legal  output  levels  when  applied  to  a 
gate.  The  same  is  not  true  of  the  dynamic  transfer  function,  Uout  =  Hd{Tpm\ 
the  output  specification  is  given  a  voltage  while  the  input  specification  is 
given  as  phase  margin. 
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Defining  a  phase  margin  to  phase  margin  specification  would  allow  defin¬ 
ing  phase  noise  mai'gins  directly  in  terms  of  the  transfer  function.  Because 
phase  margins  are  only  defined  at  the  inputs  to  docked  gates  and  because 
there  will  almost  always  be  varying  amounts  of  logic  between  successive 
clocked  latches  defining  a  general  phase- to-phase  transfer  fimction  would  be 
difficult  and  of  minimal  utility.  Instead,  the  maximum  allowable  phase  noise, 
tn^Ax  >  ^  desired  phase  noise  margin.  The  system  designer 

must  insure  that  dl  phase  margins  exceed  the  ts  and  by  ^  order 

to  achieve  a  suitably  low  probability  of  failure. 


Chapter  3 


Dynamic  Delay  Adjustment 


In  order  for  a  digital  system  to  be  reliable,  the  design  must  either  pre¬ 
vent  metastable  states  from  occurring  or  at  least  make  their  probability  of 
occurrence  so  low  as  to  be  ignorable.  The  desire  to  use  a  simple  synchronous 
design  model  but  still  have  a  high  clock  frequency  makes  it  empirative  that 
.the  two-sided  delay  constraints,  Equation  2.6,  be  satisfied.  In  systems  built 
from  TTL  and  ECL  components,  the  amount  of  circuitry  required  had  a 
strong  affect  on  the  final  performance  and  cost  of  the  system;  the  design 
time  required  was  also  important  but  could  be  amortized  over  a  large  num¬ 
ber  of  systems.  Therefore  in  such  systems,  it  made  sense  to  expend  the 
design  time  required  to  satisfy  all  of  the  delay  constraints. 

With  the  advent  of  VLSI  technologies,  some  of  the  tradeoffs  are  chang¬ 
ing.  Now  circuitry  is  relatively  cheap  and  design  time  has  become  a  much 
larger  factor  in  the  final  success  of  a  machine.  The  approach  presented  here 
reflects  this  fact.  Dynamic  Delay  Adjustment  (DDA)  is  a  circuit  based  tech¬ 
nique  for  doing  exactly  what  the  name  implies,  dynamically  adjusting  delays 
to  satisfy  the  synchronous  delay  constraints.  By  embedding  knowledge  of 
how  to  properly  satisfy  delay  constraints  in  a  circuit  that  can  be  replicated 
thousands  of  times  with  no  additional  design  expense,  all  of  the  delays  in  a 
system  can  be  automatically  and  continuously  adjusted. 
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Figure  3.1:  Phase  Shifting  of  a  Periodic  Signal  Due  to  Transmission  Delay 

3.1  Variable  Delay  Phase  Adjustment 

The  concepts  behind  the  Dynamic  Delay  Adjustment  synchronization 
technique  are  illustrated  in  Figure  3.1  in  which  a  periodic  signal, 
which  has  a  period  of  Tp  is  being  generated  in  one  module  and  transmitted  to 
another  module  over  a  path  which  has  some  delay  td>Tf.  If  the  transmitted 
signal,  El(t),  is  compared  with  a  signal  F2(t)  which  is  being  generated  locally 
and  is  supposed  to  be  identical  to  Fi(t),  F[{t)  will  appear  to  be  shifted  with 
respect  to  F2{t)  by  =  [tj  mod  Tp)  due  to  the  periodic  nature  of  Fi{t): 

Fl(i)  =  Fi(t-  ta)  =  Fi{t-  {td  mod  7»). 

This  is  hardly  a  new  observation,  designers  regularly  use  this  fact  in  the 
design  of  clock  distribution  systems  for  synchronous  machines  in  order  to 
insure  that  clock  signal  transitions  occur  at  exactly  the  same  time  in  all 
portions  of  the  machine. 

The  two  sided  delay  constraints  developed  earlier  also  used  this  fact  to 
allow  delays  on  data  lines  of  greater  than  the  clock  period  even  though  the 
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data  signals  would  not  be  truly  periodic;  in  this  case  the  designer  had  to 
determine  the  number  of  clock  cycles  the  data  would  spend  traversing  the 
wire  and  design  the  logic  to  take  the  delay  into  account. 

Most  computers  are  constructed  as  many  independent  modules,  in  some 
cases  there  may  be  modules  that  are  activated  by  requests  from  other  mod¬ 
ules  and  are  simply  idle  when  no  messages  are  being  received.  In  such  cases, 
the  exact  number  of  clock  cycles  messages  require  to  go  from  one  module  to 
another  may  not  be  critical^. 

This  characteristic  led  to  the  idea  of  using  a  circuit  similar  to  the  one 
shown  in  Figure  3.2(a)  as  a  way  of  automatically  adjusting  the  phase  of 
incoming  signals  to  insure  that  the  signals  met  the  delay  constraints  of  the 
input  circuitry.  This  synchronizer  employs  a  technique  which  will  be  called 
Dynamic  Delay  Adjustment  (DDA),  for  the  obvious  reason  that  the  syn¬ 
chronizer  is  based  on  the  idea  of  dynamically  adjusting  the  delay  seen  by 
the  data  signal  in  order  to  insure  that  no  synchronization  errors  occur. 


3.2  Operation  of  the  DDA  Synchronizer 

The  heart  of  the  synchronizer  is  the  variable  delay  line;  for  a  system  with 
a  clock  period  Tc,  the  delay  line  must  be  capable  of  providing  delays  which 
vary  from  0  to  Tc  seconds.  The  variable  delay  line  is  used  to  adjust  the 
phase  of  the  incoming  signal  before  the  signal  is  sampled  by  the  latch;  the 
rest  of  the  synchronizer  must  insure  that  the  output  of  the  delay  line  does 
not  violate  the  setup  and  hold  requirements  of  the  latch. 

The  comparator  performs  a  correlation  between  the  delayed  signrd  and 
the  reference  signal  being  generated  by  the  reference  generator;  this  correla¬ 
tion  produces  a  correction  signal  which  is  to  be  applied  to  the  variable  delay. 
The  filter  adds  an  integral  component  to  the  correlation  signal  in  order  to 
make  the  system  more  stable  and  easier  to  control.  The  Change- Enable  sig¬ 
nal  lets  the  internal  chip  circuitry  control  the  characteristics  of  the  filter  in 


^This  ia  cspecuJly  true  of  many  multi-processor  architectures. 
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Control 


Figure  3.2:  Block  Diagram  of  the  Dynamic  Delay  Adjustment  Circuit  and 
One  of  the  Common  Variations 
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order  to  prevent  changes  in  the  delay  at  critical  times. 

3.2.1  Multiple  Delay  Variation 

Figure  3.2(b)  shows  a  modification  of  the  DDA  circuitry  just  discussed 
which  will  be  used  in  both  of  the  implementation  strategies  discussed.  In  this 
variation  the  variable  delay  line  is  emulated  by  generating  several  delayed 
versions  of  the  input  and  then  selecting  among  them^.  Provided  the  delay 
values  axe  spread  evenly  over  the  required  Tc  range,  one  of  the  delay  lines 
should  provide  a  delay  that  satisfies  the  delay  constraints.  This  is  not  a 
perfect  emulation  since  the  multiple  delays  only  provide  a  limited  set  of 
discrete  delays.  This  imperfect  emulation  is  attractive  due  to  the  difficulty 
of  generating  a  very  short,  easily  variable  but  also  predictable  and  drift  free 
delay  in  MOS  technologies. 

An  important  point  to  note  regarding  the  multiple- delay  version  of  the 
DDA  synchronizer  is  that  some  accuracy  has  been  sacrificed  by  providing 
only  a  limited  number  of  discrete  delays  in  place  of  one  continuously  vari¬ 
able  delay.  Figure  3.3  compares  the  discrete  and  continuous  adjustment 
approaches;  in  either  case,  the  goal  is  to  place  the  input  trsmsitions  as  far 
from  the  latching  clock  as  possible  in  order  to  achieve  the  maximum  noise 
immunity.  The  ideal  phase  adjustment  would  result  in  the  sum  of  the  delay, 
td,  and  the  original  phase  margin,  tp^,  being  equal  to  Tc/2. 

The  plot  in  the  Figure  ^assiunes  there  are  4  delays  with  values  spread 
evenly  between  0  and  Tc  and  shows  how  the  phase  adjustment  can  vary 
from  the  ideal  by  as  much  as  7b/4  seconds.  In  the  general  case  m  which 
there  are  n  delays,  the  discrete  delays  may  reduce  the  phase  noise  margin  of 
the  synchronizer  by  as  much  as  T^/n.  As  a  result,  this  technique  can  only 
guarantee  a  minimum  phase  noise  margin  of: 

tpM  >  —z — Tc-  (3.1) 

2n 

’The  delay  lines  outputs  will  also  be  referred  to  at  tapi  for  short;  this  convention  alludes 
to  the  fact  that  the  various  delays  could  be  generated  by  taking  taps  off  a  single  analog 
delay  line. 
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This  minimum  noise  margin  will  also  be  reduced  further  if  the  delay 
values  are  not  spread  evenly  over  the  clock  period.  The  minimum  guaranteed 
phase  margin  can  be  found  by  summing  all  combinations  of  (n  -  2)/2n 
adjacent  delays;  the  phase  margin  can  not  be  guaranteed  to  be  larger  than 
the  minimum  of  these  sums’. 

3.2.2  A  PLL  In  Disguise? 

The  description  of  the  DDA  synchronizer’s  operation  makes  the  circuit 
sotmd  like  a  Phase  Locked  Loop  (PLL)  and  basically  the  DDA  synchronizer 
can  be  considered  to  be  a  PLL.  The  function  of  the  DDA  circuit  is  different 
from  that  of  a  typical  PLL  in  a  very  important  manner. 

In  most  cases,  the  function  of  a  PLL  is  to  accept  an  incoming  asyn¬ 
chronous  signal  and  phase-lock  onto  the  clock  frequency  embedded  in  the 
signal  in  order  to  synthesize  a  clock  signal  with  the  same  frequency  and 
phase  shift  as  a  clock  signal  which  had  been  sent  from  the  transmitter  along 
the  same  path  as  the  data  signal.  This  synthesized  clock  signal  can  then 
be  used  to  extract  the  data  from  the  input  reliably.  This  lets  the  receiver 
adjust  for  variations  in  both  frequency  and  phase  but  it  does  not  solve  the 
synchronization  problem  since  the  phase  of  the  decoded  data  with  respect 
to  the  local  clock  will  still  be  unknown. 

The  DDA  circuit  on  the  other  hand  assumes  that  the  data  frequency 
is  very  well  controlled  and  only  the  phase  is  unknown.  By  changing  the 
phase  of  the  incoming  signal  to  be  in-phase  with  the  local  clock,  instead  of 
adjusting  a  clock  to  the  incoming  phase,  the  DDA  synchronizer  can  address 
both  the  decoding  problem  and  the  synchronization  problem. 

3.2.3  Communication  Protocol  Assumptions 

The  input  is  assumed  to  be  an  unencoded,  serial  bit  stream  with  a  fre¬ 
quency  equal  to  the  local  clock  frequency;  also,  it  is  assumed  that  on  top  of 

^Assuming  the  delay  variation  is  small  enough  that  all  sunu  of  (n  -  2)/2n  adjacent  delays 
arc  less  than  Tcl2  and  all  sums  of  ((n  -  2)/2n)  + 1  adjacent  delays  arc  greater  than  rc/2. 
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the  serial  bit  stream  a  message-based  communication  protocol  is  being  used. 
With  this  type  of  protocol,  all  inter-chip  communications  are  done  by  insert¬ 
ing  the  information  to  be  transmitted  into  a  message  which  has  additional 
information  which  enables  interpretation  of  the  message  by  the  receiving 
chip.  Any  message  may  be  requesting  data,  returning  results,  signalling 
completion  of  a  computation  or  performing  some  other  function. 

In  most  cases  the  sending  chip  will  transmit  one  message  and  then  either 
wait  for  a  response  or  go  on  to  perform  some  other  function.  There  will 
almost  always  be  some  amount  of  imutilized  time  between  messages  if  only 
because  the  receiving  chip  will  require  some  amount  of  time  to  generate 
a  response^.  Between  messages,  an  idle  pattern  is  transmitted,  the  idle 
pattern  can  be  any  type  of  periodic  pattern  and  may  be  chosen  for  a  variety 
of  reasons,  low  power  consumption  or  ease  of  generation  and  decoding  for 
example.  The  receiving  chip  watches  the  input  for  a  break  in  the  idle  pattern 
which  signals  the  start  of  the  next  message.  Sections  3.3.2  and  3.4.2  will 
discuss  how  the  idle  pattern  can  effect  the  performance  of  the  synchronizer. 

3.3  A  Digital  CMOS  DDA  Synchronizer 

MOS  technologies  have  the  nice  property  that  on-chip  characteristics 
track  well;  device  parameters  may  vary  slightly  from  one  portion  of  the  chip 
to  another  but  they  do  not  vary  greatly.  The  same  can  not  be  said  for 
variations  from  chip  to  chip;  parameters  can  and  do  vary  noticeably  from 
chip  to  chip  even  for  wafers  fabricated  at  the  same  time.  These  variation 
makes  designing  any  components  which  have  abiolute  specifications,  e.g. 
voltage  references  or  absolute  delays,  very  difficult. 

The  approach  described  in  this  section  skirts  this  issue  somewhat;  the 
problem  of  generating  the  delays  accurately  is  hidden  by  making  some  as¬ 
sumptions  about  what  types  of  clock  signals  are  available.  Justification  of 

^Bidirectional  lines  and  long  aerial  messages  may  allow  the  receiyer  enough  time  to  formulate 
the  start  of  a  response  before  the  first  message  is  finislicd  and  begin  transmitting  as  soon 
as  the  line  can  be  reversed.  In  such  cases  the  between  message  interval  may  approach  0. 
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this  assumption  will  be  left  to  Section  3.4.3  which  discusses  the  issues  asso¬ 
ciated  with  generating  and  controlling  the  clock  signals. 

A  block  diagram  of  the  digital  DDA  synchronizer  would  be  cilmost  identi¬ 
cal  to  the  multiple  delay  version  shown  in  Figure  3.2(b).  The  only  difference 
at  this  level  would  be  the  absence  of  a  reference  signal  generator  block;  the 
data  is  assumed  to  be  simple,  unencoded  data  and  as  such  the  reference  is 
just  the  local  clock. 

The  synchronizer  described  in  this  section  has  been  implemented  as  a 
3/im  CMOS  circuit  in  order  to  demonstrate  the  feasibility  of  this  approach.  A 
simple  test  chip  has  been  fabricated  and  tested;  Section  3.6  discusses  testing 
related  issues  and  results.  The  synchronizer  as  presently  implemented  is  not 
useful  other  than  as  a  test  vehicle  for  the  basic  idea;  due  to  lack  of  time  and 
effort  and  to  some  basic  misconceptions  and  oversights  during  the  design 
phase,  the  circuitry  is  much  more  complex  than  necessary  and  requires  too 
much  area  to  be  widely  used. 

3.3.1  Digital  Delay  Lines  Using  Overlapping  Clocks 

Figure  3.4  shows  the  details  of  the  digital  delay  lines  and  the  associated 
clock  signals  used  to  generate  the  delays  required  by  the  DDA  synchronizer. 
It  should  be  noted  that  although  4  clock  signals  and  4  delay  lines  are  shown, 
this  is  not  intended  to  indicate  that  4  is  the  only  number  of  clocks  and 
delays  that  will  work.  Any  number  of  delays  can  be  used;  4  delays  provided 
a  reasonable  tradcofif  between  noise  immunity  and  circuit  complexity  in  the 
first  implementation. 

The  circuitry  required  is  fairly  simple,  consisting  of  cascaded  dynamic 
register  cells  similar  to  the  dynamic  latches  discussed  in  Section  2.1.  The 
real  heart  of  the  delay  lines  is  the  set  of  clock  signals  used  to  clock  the 
latches;  the  clocks  are  somewhat  unusual  for  MOS  circuits  because  they  are 
not  non-overlapping.  Ideally,  the  clocks  consist  of  one  phase  of  the  local 
clock  and  3  delayed  versions  of  that  clock  with  the  delays  arranged  such 
that  their  falling  edges  are  spe^ced  evenly  across  the  clock  period.  As  will 


CHAPTER  3.  DYNAMIC  DELAY  ADJUSTMENT 


44 


Figure  3.4:  The  components  and  clocks  used  in  implementing  the  digital 
delay  line  for  the  DDA  synchronizer 

be  seen,  the  important  characteristic  of  each  clock  is  where  its  falling  edge 
occurs  with  respect  to  the  local  clock;  taking  the  falling  edge  of  <l>i  to  be  the 
end  of  the  clock  cycle,  the  clock  phases  are  numbered  in  reverse  order  from 
the  order  in  which  their  falling  edges  occur  within  the  clock  cycle. 

Describing  the  delay  lines  is  complicated  somewhat  by  the  2  different 
interpretations  which  are  possible  for  the  operation  of  the  dynamic  latches. 
Section  2.1  described  the  dynamic  latches  as  being  level* triggered  rather 
than  edge-triggered.  This  description  makes  it  possible  to  view  the  cascaded 
latches  as  acting  like  delay  lines  which  delay  input  transitions  while  the  first 
latch’s  clock  is  LOW  but  do  not  delay  transitions  otherwise.  This  interpre¬ 
tation  of  the  delay  lines  is  motivated  by  the  desire  to  emulate  the  actual 
analog  delay  called  for  by  the  DDA  synchronizer  in  Figure  3.2(a). 

A  slightly  more  accurate  way  to  describe  the  latches’  outputs  is  falling 
edge  latched.  With  this  description,  the  first  latch  in  each  line  can  be  viewed 
as  sampling  the  input  signal  at  the  falling  edge  of  the  latch’s  clock.  The 
n  different  delay  lines  therefore  sample  the  input  signal  n  different  time 
during  each  clock  cycle.  The  additional  latches  following  each  of  the  sampling 
latches  are  needed  to  delay  the  samples  taken  early  in  the  clock  cycle  until 
the  final  sample  is  taken.  The  delaying  action  allows  the  selection  circuitry 
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to  access  all  of  the  samples  at  the  same  time;  also  by  having  the  o\itput 
of  each  delay  line  driven  by  a  <f>i  latch,  the  output  of  the  synchronizer  can 
be  guaranteed  to  look  almost  the  same  regardless  of  which  of  the  delay  line 
outputs  the  MUX  is  choosing. 


Simplification  of  the  Delay  Line  Circuitry 


Interpreting  the  delay  lines  as  simply  oversampling  the  input  signal  rather 
than  actually  delaying  the  signal  opens  the  door  for  some  simplification  of 
the  delay  lines.  Figure  3.5  shows  2  ways  in  which  the  delay  line  circuitry  can 
be  simplified;  all  of  these  examples  use  4  delay  lines  but  similar  approaches 
could  be  used  for  any  number  of  delays. 

The  circuit  in  Figure  3.5(a)  uses  2  less  latches  than  the  original  de* 
lay  lines.  The  two  internal  <i>2  latches  could  be  removed  due  to  the  non¬ 
overlapping  nature  of  the  <f)i  and  ^3  clocks^.  The  <f>i  latch  following  the  ^4 
latch  can  not  be  removed  since  doing  so  would  allow  transitions  occurring 
while  both  ^4  and  <^4  are  HIGH  to  propagate  through  both  latches. 

By  modifying  ^4  to  make  ^4  non-overlapping  with  respect  to  4>i  the 
extra  latch  can  be  removed  also.  The  resulting  delay  line  circuitry  and 
the  modified  clock  signals  are  shown  in  Figure  3.5(b).  This  simplification 
of  the  delay  lines  would  complicate  the  clock  generation  somewhat  since  ^ 
and  <f>4  are  no  longer  normal  non-overlapping  clocks. 

A  third  simplification  can  be  applied  in  addition  to  either  of  the  2  ap¬ 
proaches  just  described.  A  slight  modification  to  the  interface  between  the 
delay  line  outputs  and  the  selection  circuitry  can  make  it  possible  to  replace 
the  4  <^i  latches  by  a  single  latch  placed  after  the  MUX.  In  the  present  imple¬ 
mentation,  the  inputs  to  the  selection  circuitry  are  latched  by  a  divided  down 
version  of  ^3  whose  falling  edge  is  aligned  with  ^3’s  falling  edge.  This  ap¬ 
proach  would  latch  the  selection  circuitry  inputs  with  with  a  divided  down 
version  of  <f>i]  this  will  provide  the  same  inputs  to  the  selection  circuitry 


^Tliis  approach  assumes  the  clock  signals  are  generated  using  the  technique  proposed  in 
Section  3.4.3  which  results  in  n/2  piiirs  of  non-overlapping  clock  signals. 
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Figure  3.5:  Two  Strategies  for  Simplifying  the  Circuitry  Required  by  the 
Digital  Delay  Lines. 
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Figure  3.6:  Circuitry  for  Performing  the  Delay  Selection  by  Transition  Po* 
sition  Detection. 

only  the  inputs  are  accepted  half  a  cycle  earlier.  The  dynamic  latch  after 
the  MUX  latches  the  output  to  insure  the  output  obeys  the  local  clocking 
protocol. 

3.3.2  Choosing  the  Proper  Delay  by  Transition  Posi¬ 
tion  Detection 

The  discussion  in  this  section  pertains  equally  well  to  any  synchronization 
scheme  which  generates  several  delayed  versions  of  the  input  and  then  selects 
between  the  different  versions.  The  technique  presented  here  is  remarkably 
simple  and  is  based  on  detecting  when  transition  occur  by  comparing  the 
outputs  of  adjacent  delay  lines. 

Figure  3.6  shows  the  circuitry  used  to  decide  where  within  the  clock  cycle 
the  data  transitions  are  occurring.  The  decision  circuitry  consists  of  5  static, 
bistable  latches,  4  exclusive-or  (XOR)  gates  and  4  4-input  NAND  gates.  The 
latches,  and  all  of  the  control  circuitry,  are  clocked  by  <f>is  and  4>zs  which 
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are  divided  down  versions  of  the  non-overlapping  local  clocks, 
these  slower  clocks  will  also  be  referred  to  as  the  control  clocks.  Figure  3.6 
also  shows  the  relationship  between  the  local  and  control  clocks;  the  figure 
does  not  give  an  exact  frequency  ratio,  rc  =  >  since  this  ratio  will 

be  determined  by  the  characteristics  of  the  bistable  latch  as  described  in 
Appendix  A. 

Operation  of  the  Selection  Circuitry 

The  static  latches  are  used  to  insure  that  the  outputs  of  the  delay  lines  are 
at  logically  legal  states  before  the  delay  selection  circuitry  tries  to  interpret 
the  outputs.  The  last  section  claimed  that  only  4  delay  lines  were  used, 
so  why  are  there  5  static  latches?  As  will  be  seen  shortly,  although  only  4 
delays  will  be  considered  for  use  in  adjiisting  the  phase  of  the  input,  the  4 
delays  are  not  sufficient  for  making  a  delay  selection  if  the  input  stream  is  not 
known  to  have  a  data  transition  in  every  sample  cycle.  The  fifth  delay  line 
samples  the  input  with  <f>i  and  then  delays  the  sampled  value  for  1  complete 
clock  period;  this  extends  the  range  of  sampling  times  to  a  full  clock  cycle 
compared  to  the  3Tc/4  range  available  with  only  4  delay  lines. 

At  the  falling  edge  of  ^35,  the  inputs  to  the  bistable  latches  should  be 
the  values  sampled  during  the  previous  local  clock  cycle;  further,  these  val¬ 
ues  will  have  had  Tc/2  seconds  to  propagate  through  the  bistable  latches. 
During  <f>is,  the  bistable  latches  will  settle  to  legal  states;  if  there  was  a  data 
transition  during  the  sampled  data  cycle,  it  must  have  occurred  in  between 
the  falling  edges  of  two  of  the  clocks*.  A  data  transition  will  divide  the  taps 
into  two  contiguous  groups;  one  group  whose  members  latched  a  HIGH  value 
and  another  group  which  latched  a  LOW  value.  By  XORing  each  pair  of 
adjacent  taps,  the  transition  point  can  be  easily  detected. 

At  most  one  of  the  XOR  gates  will  have  a  HIGH  output.  If  there  are 
no  HIGH  outputs,  then  there  could  not  have  been  a  data  transition;  this  is 


*This  diacussion  is  ignoring  the  possibility  of  mctastablc  outputs  by  assuming  the  bistable 
latches  have  been  given  sufficient  time  to  settle  to  a  valid  logic  value  with  a  probability 
extremely  close  to  1. 
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the  situatiou  the  fifth  delay  line  was  added  to  catch.  Had  there  only  been 
4  samples,  the  selection  circuitry  would  be  unable  to  distinguish  between  a 
cycle  in  which  there  was  not  a  transition  and  one  in  which  the  transition 
occurred  between  <f>i  and  <f>i.  If  one  of  the  XOR  gates  has  a  HIGH  output, 
the  transition  occurred  between  the  falling  edges  of  the  two  clocks  which 
latched  the  input  to  generate  the  values  input  to  the  XOR  gate.  This  places 
the  transitions  within  the  interval  between  two  clocks,  which  is  as  accurate 
a  placement  as  this  scheme  can  produce. 

Assume  for  example  the  data  transitions  me  occurring  between  and 
(f>2  with  the  result  that  taps  2,  3,  4  and  5  are  always  equal  while  tap  1  was 
dififerent.  The  XOR  gate  between  tap  1  and  2  would  be  the  only  gate  with 
a  HIGH  output  indicating  that  the  transition  was  between  <f>i  and  <f>2  as 
required.  The  proper  choice  of  delay  is  that  delay  that  which  will  place  the 
latching  clock  signal  as  far  from  the  data  transitions  as  possible.  Due  to 
the  limited  accxiracy  of  this  technique,  the  data  transition  could  actually  be 
arbitrarily  close  to  either  or  so  either  <f>2  or  ^4  would  be  an  equally 
valid  choice  as  the  sampling  clock. 

The  outputs  of  the  XOR  gates  are  sufficient  to  indicate  where  the  tran¬ 
sitions  are  occurring  and  edso  which  delay  line  should  be  chosen.  It  still 
remains  to  explain  the  function  of  the  4  NAND  gates  which  are  included  in 
the  selection  circuitry.  Basically  the  NAND  gates  perform  almost  no  useful 
function  and  should  have  been  left  out  of  the  implementation.  The  gates  axe 
interconnected  so  that  3  inputs  to  each  NAND  gate  are  the  outputs  of  the 
other  3  NAND  gates  and  the  fourth  input  is  one  of  the  XOR  gate  outputs. 
The  resulting  circuit  is  called  a  4-fiop  and  it  has  the  characteristic  that  as 
long  as  at  least  one  of  the  inputs  is  HIGH,  then  only  one  of  the  4-flop’s 
outputs  will  be  LOW. 

This  has  the  somewhat  useful  function  of  insuring  that  if  for  some  reason 
more  than  one  XOR  gate  produces  a  HIGH  output  only  one  of  the  outputs 
of  the  selection  circuitry  would  be  HIGH.  There  are  2  ways  more  than  one 
XOR  gate  could  produce  a  HIGH  output,  either  2  transitions  occurred  on 
the  input  during  the  sample  period  due  to  distortion  of  the  input  signal  or 
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there  was  a  noise  glitch  somewhere  in  the  circuitry.  If  tlie  data  signals  !xre 
distorted  enough  that  2  of  the  XOR  gates  are  going  HIGH  due  to  there  being 
2  transitions  during  the  sampling  period  then  the  synchronizer  will  probably 
not  function  properly  anyway;  if  the  extra  HIGH  signals  were  produced  by  a 
noise  glitch  then  the  selection  hltcr  should  remove  the  glitch  not  the  selection 
circuitry. 

How  this  4-flop  came  to  be  in  the  design  is  clear  in  hindsight  but  how  the 
fact  that  it  performed  no  real  function  was  not  noticed  is  harder  to  explain. 
Basically,  the  4-flop  grew  out  of  an  earlier  plan  to  use  an  analog  correlation 
technique  which  would  have  been  used  in  conjunction  with  the  analog  im¬ 
plementation  of  the  synchronizer  which  is  briefly  described  in  Section  3.5. 
The  analog  correlation  technique  required  a  4-output  mutual  inhibition  gate 
which  is  what  the  4-flop  is.  When  the  correlation  technique  was  changed  the 
4- flop  was  not  removed. 

3.3.3  Some  Additional  Problems  with  the  Implemen¬ 
tation 

Continuing  in  the  spirit  of  the  discussion  of  the  4-flop,  some  other  mis¬ 
conceptions  concerning  the  operation  of  the  synchronizer  which  were  in  place 
during  the  implementation  should  be  discussed  before  the  rest  of  the  imple¬ 
mentation  is  presented.  Figure  3.7  shows  an  expanded  view  of  the  circuitry 
used  to  implement  one  bistable  latch  and  one  XOR  gate.  The  latch  is  pretty 
much  as  expected  except  that  the  Q-bar  output  is  followed  by  2  inverters 
which  drive  the  inputs  to  the  XOR  gate.  This  approach  of  following  the 
dynamic  latches  by  buffers  which  drove  the  inputs  to  gates  rather  than  di¬ 
rectly  using  the  outputs  of  the  latches  was  used  in  several  places  in  the 
synchronizer.  This  approach  was  taken  due  to  a  misconception  concerning 
how  the  control  clocks  would  be  shaped;  the  original  waveform  assumptions 
are  shown  in  Figure  3.7  as  <f>is  and  ^3$.  Because  the  dynamic  latches  must 
interface  with  the  fcister  data  handling  section,  it  was  assumed  that  <f>3s  could 
only  be  HIGH  for  the  last  Tcl2  seconds  of  ^35 ’s  period.  Further  it  was  as- 
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Figure  3.7:  An  Illustration  of  the  Circuitry  Overhead  Introduced  by  the 
Clocking  Misconception. 

sumed  that  ^is  would  be  generated  in  the  same  manner  as  <f>3s  would 
therefore  only  be  HIGH  for  Tc/2  seconds  also^.  Such  small  duty  cycles  sev¬ 
erally  restricted  the  fanout  of  any  of  the  dynamic  latches;  rather  than  worry 
about  the  clock  period  being  limited  by  a  dynamic  latch  with  too  much  load 
the  outputs  of  the  latches  were  buffered. 

The  actual  clocks  used  in  the  synchronizer  are  simileir  to  the  <f>[g  and 
signals  shown  in  the  Figure;  the  need  for  the  bistable  latches  to  interface 
with  the  data  handling  section  only  requires  that  the  falling  edge  of  ^35 
occur  before  the  rising  edge  of  <f>i.  There  is  no  limit  on  the  duty  cycle.  The 
longer  duty  cycle  would  alleviate  the  fanout  problem  associated  with  the 
dynamic  latches  making  it  possible  to  remove  the  extra  inverters. 

Another  related  problem  with  the  current  design  grew  out  of  the  fail¬ 
ure  to  predict  the  performance  of  the  bistable  latches  before  designing  the 

should  alrcndy  be  obvious  that  not  enough  time  was  spent  thinking  out  the  design  of 
the  synchronizer  before  the  implementation  was  committed  to  silicon. 
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implementation  circuitry.  Since  tlie  settling  time  which  the  latches  would 
require  was  not  calculated  ahead  of  time,  the  length  of  the  control  clock’s 
period  was  not  known;  in  order  to  not  have  the  clock  period  set  by  the  rest 
of  the  control  circuitry®,  the  design  was  pipelined.  Breaking  the  control  sec¬ 
tion  into  3  stages  complicated  the  design  procedure,  adding  conceptual  and 
circuit  complexity  without  adding  any  real  gain  in  performance. 

Most  of  the  room  for  improvement  in  the  synchronizer  implementation 
arises  from  these  2  wrong  assumptions.  Section  3.4.5  will  estimate  how  much 
the  area  could  be  reduced  by  unpipelining  the  design  and  slowing  down  the 
control  circuitry. 

3.3.4  The  Selection  Filter 

The  data  signal  will  always  possess  a  certain  amount  of  phase  jitter  due 
to  noise  in  the  system.  Such  noise  is  typically  assumed  to  have  a  normal 
distribution  and  a  zero  mean  value;  given -this  type  of  distribution  there 
is  some  probability  that  large  noise  spikes  will  occur  occasionally  even  if 
the  rms  value  of  the  noise  is  vei^  small.  Although  such  noise  signals  may 
occasionally  cause  a  bit- time  to  be  so  short  or  long  that  sampling  errors 
occur,  the  selection  circuitry  must  be  able  to  insure  that  the  effects  of  the 
noise  do  not  continue  to  effect  the  performance  of  the  synchronizer  after  the 
noise  has  subsided. 

Consider  for  example  a  situation  in  which  the  data  tramsitions  are  occur¬ 
ring  in  between  ^3  and  ^4  but  are  actually  very  close  to  ^3.  The  synchronizer 
will  choose  tap  2  as  the  delay  in  this  case.  Let  a  large  noise  signal  be  injected 
for  a  short  time  which  moves  a  single  bit’s  transition  from  the  nominal  lo¬ 
cation  near  ^3  to  after  ^4.  The  XOR  gates  will  interpret  this  as  meaning 
the  proper  tap  is  now  tap  3;  were  the  tap  selection  switched  to  tap  3  due 
to  this  single  sample,  problems  could  arise  when  the  noise  subsided  and  the 
transitions  returned  to  their  nomined  location  near  ^3.  Metastable  states 
could  be  sent  into  the  internal  chip  logic  causing  errors  to  occur  until  the 

*This  would  not  necessarily  be  bad  anyway. 
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Figure  3.8:  The  Challenger/Candidate  Selection  Comparison  Circuitry. 


selection  was  changed  back  to  tap  2. 

For  this  reason,  the  selections  generated  by  the  XOR  gates  axe  low- 
pass  filtered  to  remove  transient  delay  choices  due  to  transient  data  phase 
changes.  The  technique  used  to  do  this  filtering  was  to  require  the  XOR  gates 
to  make  the  same  selection  several  more  times  than  any  other  selection  was 
made  before  any  change  in  selection  was  made. 

At  any  point  in  time  there  are  3,  possibly  different,  selections  active 
in  the  synchronizer.  The  oldest  and  most  important  selection  is  called  the 
incumbent]  this  is  the  selection  being  used  to  control  the  MUX  and  is  held  by 
the  selection  latch.  The  next  oldest  selection  is  called  the  candidate  and  the 
newest  selection  is  the  challenger,  these  last  two  selections  are  the  internal 
state  of  the  filter  and  the  input  to  the  filter  (output  of  the  4-flop)  respectively. 

The  circuitry  used  to  hold  one  bit  of  the  candidate  selection  and  perform 
one  bit  of  the  Candidate/Challenger  comparison  is  shown  in  Figure  3.8. 
The  left  hand  portion  of  the  circuitry  is  basically  2  dynamic  latches  clocked 
by  <f>3s  whose  outputs  are  buffered  and  drive  an  XOR  gate  to  produce  the 
Selection-Match- 1  (SMI)  signal.  The  outputs  of  the  two  dynamic  latches 
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Figure  3.9:  The  UP/DOWN/H OLD  State  Register  for  the  Selection  Filter. 


are  also  fed  into  a  selector  which  is  controlled  by  the  Load  (LD)  signal.  If 
LD  goes  HIGH,  the  Challenger  selection  is  loaded  into  the  lower  latch  and 
becomes  the  new  Candidate;  otherwise  LD  will  be  LOW  and  the  Challenger 
will  be  fed  back  in  to  restore  the  latch  for  the  next  comparison. 

Also  involved  in  the  filter  is  a  4  bit  UP/DOWN/HOLD  shift  register 
which  acts  to  keep  score  for  the  selection  filtering  process.  The  circuitry  for 
one  bit  of  the  register  is  shown  in  Figure  3.9(a)  and  the  m£inner  in  which  4 
bits  are  cascaded  to  form  the  entire  register  is  shown  in  Figure  3.9(b).  The 
state  of  this  register  is  determined  by  the  number  of  bits  which  are  currently 
in  the  HIGH  state.  On  every  cycle  one  of  the  3  control  signals,  UP,  DN  or 
HOLD  must  be  raised  and  the  register’s  state  will  change  in  the  appropriate 
manner,  increasing  in  value,  decreasing  in  value  or  holding  the  same  value. 
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Filter  Operation 


On  each  cycle  of  the  sample  clock,  a  new  challenger  selection  will  be 
generated  by  the  selection  logic.  This  selection  will  be  compared  with  the 
candidate  selection,  if  the  two  match  then  an  UP  signal  is  sent  to  the  shift 
register  increasing  the  registers  value.  If  the  selections  do  not  match,  a  DN 
signal  will  be  generated  to  decrement  the  register’s  value.  If  the  data  period 
being  examined  did  not  contain  a  data  transition,  then  none  of  the  Select 
inputs  will  be  valid  and  the  HOLD  signal  will  generated. 

If  the  register’s  value  is  0  then  the  challenger  selection  is  loaded  in  as 
the  new  Candidate.  If  the  register’s  value  is  4  then  the  Candidate  is  con¬ 
sidered  to  be  a  valid  selection;  the  signal  indicating  the  Candidate  hzis  been 
qualified  is  gated  by  a  control  signal  from  the  internal  chip  logic  known  as 
Change- Enable  (CE).  The  CE  signal  indicates  that  the  internal  logic  is  not 
extracting  information  from  the  data  stream  and  a  change  of  delay  selec¬ 
tion  is  acceptable.  Assuming  CE  is  HIGH  then  the  Candidate  becomes  the 
Incumbent. 

Figure  3.10  shows  the  circuitry  used  to  generate  the  control  signals 
needed  by  the  selection  comparison  circuitry  and  by  the  UP/DOWN/HOLD 
register.  The  4  new-Selection  signals  (newSl  etc.)  are  NORcd  together  to 
form  the  HOLD  signal  while  the  4  SM  signals  are  NANDed  together  with 
the  externally  provided  Rcsetb  signal  to  form  the  an  intermediate  control 
signal  SMallb.  SMallb  (Selection-Match-all-bar)  and  its  complement,  SMall, 
are  then  NORed  with  Hold  to  form  the  UP  and  DN  signals  respectively.  Fi¬ 
nally,  the  SMallb  signal  is  NANDed  with  the  negative  true  version  of  the 
least  significant  bit  of  the  register,  BOb,  to  generate  the  LDb  signal. 

The  filter  is  actually  broken  into  2  pipeline  stages  and  all  of  these  opera¬ 
tions  do  not  occur  simultaneously.  Elxplaining  the  pipelined  operation  would 
not  add  much  useful  information  and  the  next  implementation  will  probably 
not  be  pipelined,  so  a  combinatorial  explanation  of  the  filter  was  given. 
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Figure  3.10:  Circuitry  for  Generating  the  Selection  Filter  Control  Signals. 

3.3.5  The  Selection  Latch  and  the  Delay  MUX 

The  circuitry  used  to  implement  one  bit  of  the  selection  latch  and  delay 
MUX  is  shown  in  Figure  3.11.  The  fourth  bit  of  the  UP/DOWN/HOLD 
register  is  NANDed  with  the  Change  finable  signal  to  generate  the  Selectb 
signal.  This  signal  controls  a  dynamic  latch  whose  data  input  is  one  bit  of 
the  Candidate  selection  from  the  selection  filter.  As  long  as  Selectb  is  LOW 
the  Incumbent  selection  is  updated  to  be  equal  to  the  Candidate  selection; 
when  Selectb  goes  HIGH,  the  Incumbent  is  latched  and  will  no  change. 

The  Incumbent  selection  in  turn  controls  a  dynamic  latch  whose  data 
input  is  the  output  of  the  delay  lines;  the  outputs  of  the  4  latches  which 
are  controlled  by  the  4  Incumbent  bits  all  directly  drive  the  output  of  the 
synchronizer. 


3.4  Additional  Design  Considerations 

3.4.1  What  Limits  the  Data  Rate? 


Since  the  aim  of  the  DDA  synchronizer  was  to  allow  high  speed  data 
transfers  between  MOS  chips,  the  bottom  line  in  evaluating  the  synchronizer 
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Figure  3.11:  The  Circuitry  for  One  Bit  of  the  Selection  Latch  and  the  Delay 
MUX. 

implementation  is  the  maximmn  reliable  data  rate.  In  discussions  regarding 
speed,  the  synchronizer  can  be  divided  into  two  distinct  sections,  the  low 
speed  control  section  and  the  high  speed  data  handling  section. 

The  data  rate  is  dependent  solely  on  the  clock  frequency  of  the  high 
speed  data  handling  section  of  the  synchronizer.  The  ultimate  limit  on 
the  clock  frequency  is  the  speed  at  which  the  dynamic  latches  can  turn 
on,  change  state  and  then  turn  off  but  the  practical  limit  will  turn  out 
to  depend  on  the  characteristics  of  the  multi-phase  clocks  and  the  noise 
immunity  requirements.  The  clock  rate  in  the  low  speed  section  will  depend 
on  the  settling  time  required  by  the  bistable  latches  but  this  clock  rate  has 
no  direct  effect  on  the  actual  data  rate. 

If  the  phase  of  the  input  signal  were  known  exactly  and  there  were  no 
noise,  then  the  data  rate  would  only  be  limited  by  the  setup  and  hold  re¬ 
quirements  of  the  dynamic  latch.  Assuming  the  system  uses  a  2-phzise  clock 
and  each  phase  has  a  duty  cycle  of  <  50%  and  assuming  that  ts  >  the 
clock  period  would  have  only  be  limited  to  Tc  >  2ts. 

As  discussed  in  Section  3.2.1  the  phase  is  only  known  to  within  the  reso¬ 
lution  of  the  discrete  delays,  or  in  this  case  the  resolution  of  the  clock  signals, 
and  there  is  also  noise  that  must  be  accounted  for  in  order  to  assure  reliable 
operation.  Assuming  the  minimum  guaranteed  phase  margin  of  the  synchro- 
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nizer  is  some  fraction  of  the  clock  period,  Ipmmis  —  and  the  maximum 
phase  noise  is  instAx  >  clock  period  must  be  chosen  so  that: 

^PMmin  ^  +  ^flMAX 

Tc>  kits +  (3.2) 

This  relation  shows  how  increasing  the  accuracy  of  the  synchronizer  reduces 
the  overhead  which  must  be  introduced  into  the  clock  cycle  and  thereby 
increases  the  maximum  data  rate. 

3.4.2  Effect  of  the  DDA  Synchronizer  on  Data  Trans¬ 
mission  Efficiency 

As  is  the  case  with  most  performance  measures,  the  maximum  data  rate, 
/dmax  —  is  only  half  of  the  story.  The  real  measure  of  the  synchro¬ 

nizer’s  performance  is  the  effective  data  rate,  Sdeff'  ideally,  /dmax  ~  Ideffi 
this  section  will  consider  how  the  noise  present  in  real  systems  limits  the 
amount  of  useful  data  which  can  be  transmitted  causing  /dmax  >  Sdeff- 

The  loss  of  transmission  efficiency  is  caused  by  the  break  down  of  the 
assumption  of  an  absolutely  controlled  data  frequency.  The  discussion  of 
noise  sources  in  Section2.5.1  briefly  mentioned  two  types  of  low  frequency 
noise  which  can  introduce  distortions  which  are,  or  appear  to  be,  frequency 
shifts:  data  dependent  thermal  variations  and  frequency  shifts  due  to  crystal 
variations. 

Both  of  these  noise  sources  will  be  considered  in  more  detail  shortly 
but  for  now,  assume  the  local  clock  has  a  period  of  exactly  Tq  seconds 
but  the  incoming  data  signal  has  a  slightly  shorter  period  of  Tc  —  8c ' 
incoming  data  will  appear  to  be  distorted  by  a  continuous  noise  signal  which 
is  injecting  a  phase  skew  of  —8c  seconds  per  local  clock  period.  This  phase 
noise  is  cumulative  in  nature;  assuming  the  input  transitions  are  initially 
occurring  between  4>2  phase  margin  of  the  input  with  respect 

to  ^3®  will  decrease  by  8c  every  clock  cycle.  The  S3nQchronizer  will  not  see 

*The  first  local  clock  wliich  follows  the  input  transitions. 
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any  skew  in  the  input  until  the  input  skews  completely  past  ^3.  Once  the 
skew  reaches  these  proportions,  the  synchronizer  must  be  able  to  respond 
and  change  the  delay  selection  before  the  transitions  skew  f<ir  enough  to 
approach  the  sampling  clock  signal. 

How  quickly  the  synchronizer  can  respond  to  phase  skew  depends  on  the 
characteristics  of  the  selection  filter.  For  instance,  the  current  implemen¬ 
tation  requires  a  minimum  of  4  cycles  of  the  control  clocks  to  approve  a 
selection  following  a  reset  of  the  synchronizer  but  would  require  a  minimum 
of  8  cycles  in  order  to  respond  to  a  chcinge  in  phase.  Let  Np  be  defined  to 
be  the  minimum  number  of  cycles  of  the  control  clock  required  to  change 
the  delay  selection;  for  the  current  synchronizer  Nf  —  8. 

Combining  the  skew  rate,  6c  in  seconds  per  data  cycle,  and  Nf  with  the 
ratio  of  the  control  clock  period  to  the  data  clock  period,  r^,  and  the  mini¬ 
mum  guaranteed  phase  margin,  ipuuiN^  expressions  for  the  maximum 
allowable  skew  for  a  given  Np  or  the  minimum  allowable  filter  delay  for  a 
given  skew: 


1 


(3.4) 

Tc  Oc 

For  the  present  implementation  Tpmuin  —  Ic/4  and  Np  =  8;  the  control 
clock  ratio  rc  is  not  absolutely  set  but  Appendix  A  suggests  rc  =  12  to 
obtain  a  sufficiently  low  probability  of  failure.  Using  these  values,  a  ^Cmax 
can  be  calculated: 


=  -^^»2.5xl0->ro. 


Sources  of  Low  Frequency  Noise 

A  brief  discussion  of  how  the  type  of  skew  just  described  can  arise  is  in 
order.  There  at  least  2  types  of  noise  which  can  cause  short  or  long  term 
frequency  skew:  frequency  variation  between  the  local  and  remote  clocks 
and  thermally  induced  phase  variations. 
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If  a  single  crystal  oscillator  is  used  to  generate  the  basic  clock  frequency 
for  the  entire  machine  then  the  data  frequency  would  be  exactly  the  same 
everywhere.  However,  in  order  to  increase  the  reliability  of  the  machine  sev¬ 
eral  oscillators  might  be  used.  Although  crystal  oscillators  are  very  accurate 
and  well  controlled,  no  2  crystals  will  be  exactly  the  same  so  some  slight 
frequency  variation  will  occur.- 

Crystal  oscillators  with  frequency  tolerances  of  0.01%  provide  frequency 
control  tight  enough  to  limit  the  skew  rate  to  6c  —  10~*Tc  seconds  per 
cycle.  This  amount  of  variation  is  well  within  the  theoretical  capability  of 
the  synchronizer. 

Thermal  variations  can  also  produce  phase  shifts  by  changing  the  delay 
of  the  off-chip  drivers  and  receivers.  Most  delays  in  MOS  chips  are  directly 
proportional  to  the  electron  and  hole  mobilities  of  the  devices;  the  temper¬ 
ature  dependence  of  the  mobilities  can  be  expressed  as: 

(3.5) 

where  Ti  and  T2  are  absolute  temperatures  and  M  is  between  1  and  2, 
typically  taken  to  be  1.5.  For  instance,  if  the  chip  increased  from  a  room 
temperature  of  20'’C  to  relatively  low  operating  temperature  of  50°C,  the 
mobilities  would  decrease  by  a  factor  of  0.86  and  the  delays  would  increase 
by  roughly  15%. 

Steady  state  thermal  variation  can  be  accommodated  by  allowing  some 
reasonable  warmup  time  after  the  machine  is  initially  powered  up.  There  is 
another  type  of  thermal  variation  which  can  not  be  handled  so  easily:  data 
dependent  thermal  variations.  The  discussion  of  delay  in  Section  2.4.2  men¬ 
tioned  that  the  capacitive  nature  of  most  loads  in  MOS  systems  introduced 
an  important  CV^f  component  to  the  power  dissipation.  One  point  that 
was  not  mentioned  at  that  time  was  that  the  /  in  the  power  equation  is 
really  the  frequency  of  signal  transitions  rather  than  clock  frequency.  Since 
the  data  being  transmitted  between  chips  is  not  encoded  in  any  way,  con¬ 
secutive  I’s  or  O’s  in  the  data  will  not  produce  any  data  transitions  so  the 
frequency  of  transitions  will  be  less  than  the  clock  frequency  whenever  actual 
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data,  as  opposed  to  aii  idle  pattern,  is  being  transmitted.  During  operation, 
the  transition  frequency  seen  on  any  individual  line  will  be  dependent  on 
the  actual  data  being  transmitted  and  will  vary  greatly;  the  power  being 
dissipated  by  the  buffer  driving  the  line  will  also  vary  greatly. 

The  substrate  will  conduct  the  heat  away  from  the  pad  driver  but  because 
the  power  density  will  be  much  greater  around  the  pad  drivers,  the  temper¬ 
ature  will  be  higher  also.  The  data  dependent  fluctuations  in  the  power 
being  dissipated  may  also  cause  larger  temperature  fluctuations  around  the 
oflf-chip  buflfers  than  in  the  rest  of  the  chip.  The  large  delays  of  the  drivers 
and  receivers  relative  to  the  clock  period  make  even  10-20%  increase  in  delay 
an  important  reduction  in  the  noise  margin.  No  estimate  of  the  Scs  which 
can  be  generated  by  data  dependent  thermal  variations  has  been  attempted. 

When  Can  Delay  Changes  Be  Made? 

Assume  that  a  situation  exists  in  which  data  transitions  are  occurring 
between  ^3  and  ^  and  the  signal  is  skewing  towards  ^3.  The  sampling 
clock  will  originally  be  <f>i  but  when  the  input  skews  past  ^3,  the  selection 
circuitry  will  eventually  decide  to  change  the  sampling  clock  to  ^4.  When 
the  switch  in  sampling  clocks  occurs,  2  scenarios  are  possible.  Assume  first 
that  the  MUX  switches  within  one  data  clock  cycle  and  define  t,-  to  be  the 
time  when  the  falling  edge  of  <j>i  occurred  which  produced  the  last  output  bit 
sampled  with  ^1.  On  the  next  cycle,  the  output  bit  will  have  been  sampled 
by  the  falling  edge  of  ^4  which  occurred  at  U  -i-  Tc/^  seconds;  this  sample 
was  actually  the  same  input  bit  that  the  last  <f>i  sample  saw.  The  switch 
in  sampling  clocks  introduced  an  extra  bit  into  the  data  stream;  just  the 
opposite  will  occur  for  a  switch  from  ^4  to  <f>i,  one  bit  will  be  missed. 

Another  possibility  is  that  the  switch  will  not  occur  within  one  data  clock 
cycle  but  rather  within  one  control  clock  cycle.  In  this  case,  rc  data  bits 
may  be  missed  and/or  be  output  as  illegal  logic  state  due  to  a  slow  switching 
time. 

For  both  of  these  reasons,  the  internal  logic  must  have  some  control  over 
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when  changes  in  the  delay  selection  occurs.  This  was  the  reason  for  including 
the  ChangcJEnable  (CE)  signal.  When  useful  data  is  being  transmitted  the 
internal  logic  must  lower  CE  in  order  to  insure  that  delay  changes  do  not 
cause  extra  or  missed  bits. 

The  inevitability  of  losing  bits  or  producing  extra  ones  means  that  there 
must  be  some  times  when  no  data  is  being  transmitted  so  delay  selection 
changes  can  be  made.  Farther,  the  idle  time  must  occur  frequently  enough  to 
insure  that  the  delay  selection  can  always  be  changed  before  the  signal  skew 
causes  phase  margin  problems;  this  limits  the  length  of  contiguous  messages. 

The  situation  is  further  complicated  by  the  fact  that  the  data  within 
the  messages  can  not  be  relied  on  to  generate  transitions  for  the  selection 
circuitry^®.  The  expressions  for  ^Ca#ax  ^Fmin  assumed  that  data  transi¬ 
tions  were  observed  on  every  cycle  of  the  control  clocks.  A  more  conservative 
approach  is  to  assume  that  no  data  transitions  will  be  observed  by  the  delay 
selection  logic  and  then  to  determine  the  percentage  of  the  bandwidth  that 
must  be  allocated  to  between  message  idle  time  in  order  to  insure  proper 
synchronizer  operation  for  some  given  Sc,  Nf,  rc  and  tpMM/w 

Let  ks  be  the  number  of  data  cycles  required  for  the  skewing  signal  to 
reduce  the  phase  margin  to  0;  ks  will  be  given  by: 

Within  every  ks  data  cycles  the  control  circuitry  must  observe  at  least  N^ 
data  transitions;  insuring  this  requires  Npre  idle  pattern  data  cycles.  These 
idle  pattern  bits  reduce  the  effective  data  frequency: 

/deff  ^  /dmax  (l  - 

Assuming  the  thermally  induced  skews  will  be  less  than  the  frequency 
difference  induced  skews,  the  current  synchronizer  has  a  maximum  effective 

^**Most  data  transmissions  will  contain  some  type  of  parity  or  cliccksum  information  to  help 
guard  against  transmission  errors;  these  techniques  will  insure  that  the  data  signal  has 
some  minimum  transition  frequency,  /(.  Tins  minimum  frequency  will  translate  into  some 
minimum  average  frequency  of  observed  transitions. 
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data  rate  of: 


J_  (t  _  i2x8xl0j7c'\ 
Tc  V  Tc/A  ) 

0.96/c. 


3.4.3  Clock  Generation  and  Control  Issues 

In  the  digital  implementation  presented  in  this  chapter,  the  delays  and 
the  resulting  phase  margins  depend  on  the  relationships  between  the  falling 
edges  of  the  sampling  clocks.  Specifically,  the  time  differences  between  the 
falling  edge  of  each  clock  and  the  falling  edge  of  4>i  will  be  the  delay  values, 
and  since  only  4  clock  phases  were  used  in  the  actual  implementation, 
would  be  the  smallest  time  difference  between  the  4  falling  clock  edges.  The 
clocks  are  intended  to  be  spaced  exactly  Tc/4  seconds  apart;  the  actual 
phase  margin  will  depend  on  how  accurately  the  clock  phases  sure  generated. 
This  section  will  consider  one  method  of  generating  the  clock  edges  and  the 
sensitivity  of  the  resulting  phase  margin  to  various  types  of  clock  distortions. 

Figure  3.12  shows  one  simple  method  of  generating  the  4  sampling  clocks 
starting  from  two  clock  signals,  <f>ji  and  is  a  single  phase  clock  with 

a  period  of  Tc  and  a  duty  cycle  of  tjxifTc\  <i>B  is  exactly  identical  to  <I>a  but 
is  offset  (delayed)  with  respect  to  the  <f>A  by  torF  seconds.  The  mput  clocks 
are  inverted  to  generate  4>a  delay  of  the  inverters  is  assumed 

to  be  tiny.  Each  pair  of  cross-coupled  NOR  gates  splits  one  of  the  input 
clocks  into  two  non-overlapping  clock  phases;  the  delay  of  the  NOR  gates 
is  assumed  to  be  The  resulting  4  clock  phases  are  also  shown  in 

Figure  3.12. 

Taking  the  rising  edge  of  <i>A  to  be  to  =  0  seconds,  the  falling  edges  of  the 
4  clocks  will  occur  at: 

i\—iOFF  +  t£)o  +  t/jw  +  tsoR 
h=tDO  +  iittv  +  isOR 
h—hrF  +  ^NOR 
U—^NOR 

**Thi9  section  docs  not  disoiss  problems  associated  with  the  generating  the  eomplements  of 
the  clock. 


Fi^re  3.12:  One  Simple  Method  for  Generating  the  4  Sampling  Clocks  from 
2  Input  Clocks. 


The  ideal  values  for  too  and  toFF  are  Tc/2  and  rc/4  respectively;  the  actual 
values  will  vary  from  the  ideals  somewhat  resulting  in  =  Tc/2  +  S^o  and 
toFF  —  Tc/^  +  6off>  By  plugging  these  values  into  the  equations  for 
and  then  taking  the  dilTercnccs  between  adjacent  signab^^: 

^  +  ^OFF 

^  —  ^OFF  +  ^INV 

^  +  Sqff 

^  —  Sdq  —  deltaoFF  —  ^inv 

This  shows  that  the  initial  clocks’  duty  cycle  and  the  offset  between  the 
clocks  both  directly  effect  the  synchronizer’s  minimum  phase  margin. 


Generating  and 

The  generation  of  4>a  and  <f>B  must  be  based  on  some  technique  which 
does  not  require  accurately  controlled  MOS  gate  delays.  One  approach  to 
the  problem  is  to  take  advantcage  of  the  availability  of  predictable  delays  in 

**Tlic  (lifTcrencc  between  aiid  ^4  ia  found  from  <4  —  ti  +  Tc- 
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bipolar  technologies;  very  accurately  controlled  ECL  and  TTL  delay  lines 
are  available.  A  single  crystal  oscillator  could  be  used  to  generate  <f)A  which 
would  be  distributed  to  all  of  the  system;  then  an  ECL  delay  line  could  be 
used  to  delay  (f>A  to  generate  4>b  locally^^.  The  two  major  problems  with 
this  approach  arc  controlling  the  distribution  of  <t>j^  and  <f>D  on  each  board 
and  more  importantly  controlling  the  distortion  of  the  signal  during  the 
ECL-to-CMOS  level  conversion  stage.  This  approach  would  also  limit  the 
flexibility  in  choosing  the  clock  frequency  during  debug  and  testing  stages 
of  the  design. 

A  second  possibility  for  generating  <I>a  and  <{>b  is  also  bcised  on  delaying  (f>A 
by  a  well  controlled  amount;  rather  than  use  an  active  delay  line  the  passive 
delay  inherent  in  the  pc  board  traces  could  be  used.  The  transmission  speed 
along  the  pc  traces  depends  only  on  the  electromagnetic  properties  of  the  pc 
board  materials  and  traces.  <f)A  could  be  distributed  to  every  chip  and  the 
<f)A  and  inputs  could  be  interconnected  in  such  a  way  that  <I>a  will  take 
rc/4  seconds  to  travel  from  the  0a  input  to  the  4>b  input.  No  additional 
circuitry  would  be  required  to  generate  0b*  This  approach  is  as  feasible  as 
the  first;  the  main  drawback  of  this  approach  is  that  once  again  the  signals 
will  be  subject  to  distortion  during  the  signal  conversion  and  buffering  stages. 
Also,  there  would  be  no  way  to  vary  the  frequency  once  the  pc  boards  were 
fabricated. 

A  third  technique  would  be  to  distribute  a  clock  signal  with  a  period 
of  Tc/2  and  divide  the  signal  down  to  the  proper  clock  frequency  on-chip. 
The  main  advantages  of  this  technique  are  that  only  a  single  I/O  pin  would 
be  needed  for  the  clock  signal  and  no  extra  off-chip  circuitry  or  pc  board 
complexity  would  be  required.  The  additional  on-chip  circuitry  would  be 
minimal  since  some  type  of  frequency  division  is  going  to  be  required  anyway 
to  generate  the  slower  clocks  for  the  control  section  of  the  synchronizer. 
Generating  all  of  the  clocks  by  frequency  division  followed  by  phase  splitting 
might  make  the  alignment  of  the  slow  and  fast  clock  edges  would  be  easier 

^’Locally  would  probably  mean  one  delay  line  per  board  or  at  least  one  per  several  clups. 
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also.  This  technique  would  allow  varying  the  clock  period  to  facilitate  debug 
and  testing  without  limiting  the  final  speed  of  the  system. 

The  frequency  division  could  be  performed  in  such  a  way  that  the  rising 
edge  of  the  clock  caused  4>a  transitions  and  falling  edges  caused  4>d  transi¬ 
tions.  This  would  give  an  offset  between  <}>a  and  (t>B  equal  to  the  duty  cycle  of 
the  original  clock  signal  which  ideally  would  equal  Tc/4  but  this  offset  would 
still  be  susceptible  to  distortion  of  the  original  clock  by  the  level  conversion 
circuitry. 

One  argument  against  the  on-chip  division  technique  is  that  if  the  clock 
division  circuitry  can  operate  at  double  the  data  rate,  the  data  period  could 
be  cut  in  half.  While  the  clock  frequency  is  ultimately  limited  by  the  switch¬ 
ing  speed  of  the  dynamic  latches,  the  more  realistic  limit  will  be  the  delay 
through  2  or  more  stages  of  arbitrary  logic.  The  frequency  division  parts  of 
the  circuitry  would  be  fairly  simple  so  it  is  feasible  that  operation  at  double 
the  clock  rate  is  possible  but  no  design  hzis  been  attempted  to  confirm  this 
claim. 

None  of  the  clock  generation  techniques  are  immune  to  distortion  of  the 
sampling  clock  signals  due  to  distortion  of  the  duty  cycles  of,  and  offset 
between,  4>a  The  frequency  division  approach  is  more  flexible  than 

the  other  approaches  in  terms  of  changing  the  frequency,  but  the  off  chip 
ECL  delay  line  approach  would  be  the  simplest  in  terms  of  design. 

3.4.4  Non- Random  Phase  Jitter 

The  discussions  of  noise  have  assumed  the  phase  noise  distortions  all 
consisted  of  basically  3  classes  of  noise:  fast  transient  jitter,  low  frequency 
phase  skew  and  dc  phase  offsets.  The  synchronizer  and  data  transmission 
protocol  were  designed  with  these  three  types  of  noise  in  mind.  There  is 
another  type  of  phase  noise  which  the  current  implementation  does  not  try 
to  handle  —  phase  jitter  caused  by  distortion  of  the  duty  cycle  of  the  data 
signal. 

A  CMOS  inverter  with  equal  width  pullup  and  pulldown  will  distort  a 
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data  signal  due  to  the  differences  between  the  characteristics  of  the  n*  and 
pttype  devices.  Such  distortions  can  be  designed  around  to  a  certain  extent 
by  adjusting  the  ratio  of  the  device  widths  in  the  gates  and  by  other  more 
involved  techniques.  Such  adjustments  only  work  perfectly  for  one  particular 
set  of  device  characteristics  and  the  characteristics  of  the  n-  and  P'type 
devices  do  not  necessarily  track  each  other  over  process  variations.  Any  pair 
of  chips  may  be  fabricated  at  different  times  and  the  resulting  variation  in 
the  sending  chips  output  buffers  and  the  receiving  chips  receiving  circuitry 
can  result  in  different  phase  margins  for  rising  and  falling  transitions;  the 
setup  and  hold  requirements  for  the  latches  are  also  slightly  different  for 
falling  and  rising  transitions. 

Future  implementation  may  have  to  take  this  distortion  into  account  in 
order  to  avoid  having  the  synchronizer  fail  to  make  a  delay  choice  due  to 
seeing  different  phases  for  rising  and  falling  edges. 

3.4.5  Area  Requirements  for  DDA  Synchronization 

Although  the  first  goal  of  the  DDA  synchronizer  was  to  allow  high  speed 
data  transmissions  directly  between  MOS  chips,  the  second  goal  was  almost 
as  important:  keep  the  technique  and  circuitry  simple  enough  to  allow  uti¬ 
lizing  the  synchronizer  on  every  wire  in  a  system.  This  would  mean  placing 
as  many  as  50-100  synchronizer  on  a  single  chip;  the  area  required  must  still 
be  only  a  small  percentage  of  the  entire  chip  area  in  order  for  such  wide 
scale  utilization  to  be  possible.  Determining  a  maximum  size  beyond  which 
the  synchronizer  would  no  longer  be  useful  is  fairly  arbitrary;  the  criteria 
used  here  is  that  the  synchronizer  must  be  no  large  than  the  high  speed  pad 
driver  being  used. 

Given  this  restriction,  the  current  implementation  is  almost  7  times  too 
large  to  be  used  widely.  The  3/im  synchronize  implementation  is  1900  by  900 
microns  compared  to  an  experimental  50  MHz  low  power  pad  driver  which  is 
375  by  675  microns.  This  section  will  discuss  how  the  next  implementation 
can  attack  the  problem  of  shrinking  the  synchronizer  by  a  factor  of  8. 
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The  area  utilized  by  the  synchronizer  is  divided  between  5  major  com¬ 
ponents:  delay  lines,  selection  logic,  filter  logic,  MUX  circuitry  and  empty 
space.  A  rough  accounting  of  the  space  used  by  the  different  components 
gives  the  following  break  down: 

Delay  Lines:  1.0  x  10®  sq.  microns 

Selection  Logic:  2.2  x  10®  sq.  microns 
Filter  Logic:  6.5  x  10®  sq.  microns 

MUX  Circuitry:  1.0  x  10®  sq.  microns 
Empty  Space:  6.4  x  10®  sq.  microns 

An  immediate  observation  is  that  simply  compacting  the  existing  layout 
to  remove  as  much  of  the  empty  space  as  possible  would  probably  reduce 
the  area  by  20-30%.  The  second  observation  is  that  most  of  the  area  is 
allocated  to  the  control  circuitry  as  opposed  to  the  circuitry  that  actually 
does  the  useful  work:  the  delay  lines  and  the  MUX.  Each  of  the  areas  will 
be  considered  individually  to  estimate  how  much  improvement  is  possible 
over  the  existing  design;  the  area  reductions  listed  in  the  discussion  of  each 
section  refer  to  the  reduction  of  the  area  required  for  that  particular  section 
—  not  for  the  entire  synchronizer. 

The  Delay  Lines 

The  description  of  the  delay  lines  in  Section  3.3.1  gave  3  approaches  to 
simplifying  the  circuitry  required  for  the  delay  lines.  The  most  promising 
approach  reduced  the  circuitry  from  10  latches  to  just  4  resulting  in  an  area 
savings  of  60%.  One  of  the  4  remaining  latches  could  also  be  removed  at  the 
expense  of  a  slight  increase  clock  circuitry  complexity. 

The  Control  Section 

The  area  required  for  the  selection  and  filter  logic  can  be  reduced  at  the 
expense  of  lengthening  the  period  of  the  control  clock.  A  major  conceptual 
change  would  be  to  unpipeline  all  of  the  control  logic  by  removing  most  of 
the  dynamic  latches  and  making  the  control  logic  combinatorial.  Also,  the 
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transistor  sizes  used  in  the  control  logic  could  be  reduced;  the  current  im¬ 
plementation  makes  very  liberal  use  of  donut  transistors  in  order  to  increase 
the  speed  of  the  control  logic^^. 

The  effect  of  the  unpipelining  will  be  discussed  separately  for  the  selection 
and  filtering  logic  but  the  reduction  in  area  due  to  using  smaller  transistors 
should  be  fairly  uniform.  Given  the  current  3/im  design  rules,  a  simple 
inverter  using  donut  transistors  requires  roughly  2.5  times  as  much  area  as 
an  inverter  with  only  slightly  larger  than  minimum  size  devices.  This  is  too 
optimistic  of  an  estimate  to  apply  to  all  of  the  control  logic;  a  reduction  of 
the  area  by  a  third  is  more  realistic  and  is  the  reduction  factor  that  will  be 
used. 

The  Selection  Logic 

The  main  reduction  which  is  unique  to  the  selection  logic  is  the  removal 
of  the  4-flop;  this  alone  would  amount  to  a  reduction  of  30%.  Without  using 
a  different  type  of  bistable  latch,  the  area  required  for  the  latch  would  be 
hard  to  reduce  drastically.  Replacing  the  single  dynamic  latch  followed  by  2 
buffers  with  2  smaller  buffers  directly  driven  by  the  static  latch  would  reduce 
the  area  by  another  15%. 

The  Filter  Logic 

The  unpipelining  process  would  simplify  the  circuitry  needed  for  the 
Candidate/Challenger  comparison  process;  this  circuitry  presently  accounts 
for  roughly  40%  of  the  filter  logic.  A  single  dynamic  latch  to  hold  the 
Candidate  selection,  inverters  to  generate  the  complementary  signals  and 
the  XOR  gate  should  be  sufficient  to  perform  the  comparison.  Given  such  a 
drastic  cut  in  the  circuitry,  only  a  rough  estimate  can  be  made  but  an  area 

^^Donut  traiuistors  use  a  square  donut  of  polysilicon  set  in  a  large  area  of  active  area.  The 
hole  in  the  donut  is  usuaUy  used  as  the  drain  of  the  transistor  while  all  of  the  diffusion 
surrounding  the  donut  forms  the  source.  In  this  manner,  a  transistor  wliich  is  more  than 
4  times  the  minimum  can  be  formed  with  no  more  parasitics  than  a  minimum  size  device 
would  have. 
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reduction  of  50%  seems  reasonable  provided  the  device  sizes  are  also  scaled 
down. 

A  design  change  that  could  reduce  the  filter  area  significantly  would  be 
to  encode  the  selection  signals  into  2  bits  rather  than  using  all  4  bits  in  the 
selection  filtering  process.  This  would  add  a  little  overhead  to  encode  and 
decode  the  selection  but  the  circuitry  needed  for  the  comparison  part  of  the 
filter  would  be  cut  in  half. 

The  best  way  to  shrink  the  counter  section  of  the  control  circuitry  is 
not  fully  understood  at  this  time.  One  factor  that  is  still  unknown  is  just 
how  much  filtering  action  is  really  needed;  an  area  reduction  of  50%  would 
be  automatic  if  the  Nf  delay  of  the  filter  were  cut  in  half  by  shortening 
the  counter  to  2  bits  from  4.  Other  options  include  using  a  simple  binary 
counter  which  is  reset  when  the  selection  comparison  fails  or  some  other  type 
of  cotmting  technique.  An  entirely  different  scheme  could  greatly  simplify 
the  circuitry  required.  At  this  time,  the  only  guaranteed  reduction  would  be 
by  reducing  the  device  sizes  which  has  already  been  estimated  to  produce  a 
reduction  of  33%. 


The  Delay  MUX  and  Selection  Latch 

The  delay  MUX  is  the  only  part  of  the  synchronizer  besides  the  delay 
lines  that  must  be  fast;  as  such  reducing  the  area  of  the  MUX  by  a  noticeable 
amount  will  require  a  different  design  approach.  As  such  no  estimate  of  the 
area  savings  will  be  made. 

The  Select  Latch  does  not  have  to  run  fast  provided  the  chip  circuitry 
knows  not  to  use  the  output  of  the  synchronizer  for  rc  clock  cycles  after  rais* 
ing  Change  Enable.  The  circuitry  of  the  latch  could  be  simplified  somewhat 
to  reduce  the  number  of  additional  buffers  required;  combined  with  reducing 
the  sizes  of  the  transistors,  the  net  area  savings  should  be  around  50%  for 
the  latch.  Since  the  latch  currently  accounts  for  half  of  the  MUX/Latch 
circuitry,  the  total  reduction  will  be  around  25%. 
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Design  Changes 

The  incremental  changes  suggested  for  each  section  of  the  synchronizer, 
and  the  general  shrinking  of  the  control  circuitry  will  go  a  long  ways  to  get¬ 
ting  the  reduction  in  size  that  is  needed;  however,  achieving  a  reduction  by 
a  factor  of  7  or  more  will  diilicult  without  major  changes  in  the  structure  of 
the  synchronizer.  The  discussion  of  the  maximum  effective  data  rate  pointed 
out  that  even  with  skewing  of  the  data  due  to  unmatched  clock  frequencies 
there  each  synchronizer  only  needs  to  examine  the  input  transitions  for  a 
small  portion  of  the  time  in  order  to  monitor  the  phase  and  make  correc¬ 
tions  to  the  delay  selection.  This  opens  the  possibility  of  sharing  a  single 
block  of  control  circuitry  between  2  or  more  synchronizers;  since  the  control 
circuitry  is  the  largest  portion  of  the  synchronizer,  amortizing  the  circuitry 
over  several  input  ports  would  produce  a  big  savings  in  area. 

Conclusion 

Totalling  all  of  the  proposed  simplifications  and  reductions  without  shar¬ 
ing  the  control  sections  produces  an  estimated  reduction  in  the  area  of  the 
synchronizer  to  around  6.3  X  10^  square  microns;  this  is  a  reduction  of  less 
than  a  factor  of  3.  Sharing  the  control  logic  between  2  synchronizers  would 
boost  the  reduction  factor  to  4.5  times  and  then  cutting  the  filter  in  half 
would  produce  a  cumulative  gain  of  area  reduction  by  a  factor  of  5.5. 

Further  sharing  of  the  control  logic  and  encoding  the  selection  bits  cou¬ 
pled  with  a  more  clearly  thought  out  design  in  general  should  eventually 
reduce  the  area  to  a  manageable  size. 


3.5  An  Analog  Approach 


An  alternative  to  the  approach  just  described  was  also  investigated;  this 
alternative  was  more  analog  in  nature  and  was  based  on  the  approach  used 
to  construct  a  single-chip  Ethernet  transceivcr[l].  This  approach  uses  a  Volt¬ 
age  Controlled  Delay  Line  (VCD)  and  takes  advantage  of  the  good  on-chip 
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Figure  3.13:  Block  Diagram  of  the  Analog  Based  DDA  Synchronizer 

parameter  tracking  which  MOS  technologies  provide.  The  VCD  is  used  to 
construct  a  ring  oscillator  which  can  in  turn  be  used  as  the  Voltage  Con¬ 
trolled  Oscillator  (VCO]  in  a  Phase  Locked  Loop  (PLL) 

A  block  diagram  of  a  DDA  synchronizer  using  this  approach  is  shown 
in  Figure  3.13.  This  version  of  the  synchronizer  uses  the  multiple  delays 
variation  of  the  DDA  block  diagram  with  the  slight  variation  that  all  of  the 
versions  are  generated  by  a  single  delay  line.  The  most  important  circuit 
components  in  this  version  are  those  which  collectively  control  the  delay  line, 
the  VCD,  the  Phase  Frequency  Detector  (PFD)  and  the  Loop  Filter  (LF). 
This  approach  involves  the  design  of  a  Phase  Locked  Loop  with  all  of 
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the  associated  complications  due  to  stability  issues,  frequency  acquisition, 
lock  detection  etc.  Since  this  was  not  the  approach  chosen,  a  discussion  of 
the  issues  involved  which  really  did  justice  to  them  is  not  within  the  scope 
of  this  thesis;  only  a  brief  description  of  the  operation  of  the  analog  DDA 
synchronizer  will  be  given.  The  subject  of  PLL’s  is  treated  very  thoroughly 
by  Gardner  in  [10]  and  [llj. 

When  the  computer  is  first  powered  up  or  if  the  synchi'onizer  is  reset, 
the  reference  frequency  (the  local  clock  probably)  will  be  used  to  bring  the 
PLL  into  lock.  Assuming  the  system  is  designed  properly,  the  feedback  con¬ 
trol  provided  by  the  Phase  Frequency  Detector  (PFD)  will  gradually  adjust 
Vc  until  the  frequency  of  the  VCO  is  exactly  equal  to  reference  frequency; 
the  phase  of  the  VCO  will  also  probably  be  adjusted  to  equal  that  of  the 
reference  signal.  Once  the  PLL  is  in-lock  £ind  depending  on  whether  the 
ring  oscillator’s  output  is  divided  down  or  not,  the  delay  of  the  VCD  will  be 
either  exactly  equal  to  the  clock  period  or  the  delay  will  be  some  fraction  of 
the  period. 

The  same  control  voltage  is  also  applied  to  another  VCD  which  is  de¬ 
signed  to  have  identical  characteristics  to  the  VCD  used  in  the  VCO.  In 
this  manner,  when  the  PLL  is  locked  onto  the  reference  frequency,  the  delay 
of  the  second  VCD  will  be  known  almost  exactly^®.  By  applying  the  data 
input  to  this  VCD  and  taking  taps  off  the  delay  line  at  appropriate  points, 
a  variety  of  delayed  versions  of  the  input  are  generated.  The  selection  of  the 
proper  delay  could  then  be  done  in  much  the  same  manner  as  for  the  digital 
synchronizer. 

The  main  reason  for  not  choosing  this  approach  was  the  difficult  of  de¬ 
signing  a  high  frequency  PLL  which  would  be  stable  over  all  the  processing 
variations.  High  frequency  operation  would  require  a  very  short  VCD  con¬ 
sisting  of  a  few  voltage  controlled  delay  elements;  variations  in  the  delay  of 
each  element  can  result  in  variations  by  a  factor  of  2  or  more  in  the  basic 
frequency  of  the  VCO.  Accommodating  this  wide  variation  requires  a  delay 

*®The  desigiicra  at  Xerox  claimed  to  be  able  to  control  the  delay  to  within  a  0.1%  error. 


CHAPTER  3.  DYNAMIC  DELAY  ADJUSTMENT 


74 


element  which  is  fairly  sensitive  to  the  control  voltage  but  a  high-gain  VCO 
makes  the  entire  system  less  stable.  This  situation  is  further  compounded 
when  the  synchronizer  will  be  fabricated  through  MOSIS  since  the  process 
variations  can  only  be  guessed  at.  The  digital  synchronizer  seemed  much 
more  likely  to  be  functional  even  given  wide  processing  variations. 


3.6  Testing  Results 

The  CMOS  implementation  of  the  DDA  synchronizer  has  been  fabricated 
through  the  MOSIS  fabrication  service[6].  The  testing  of  the  chips  is  the 
subject  of  this  Section. 

The  testing  process  was  intended  to  be  a  2  step  process:  low  speed  testing 
to  insure  the  chips  were  functionally  correct  followed  by  high  speed  testing  to 
determine  how  fast  the  chips  were.  Instead  a  four  stage  process  was  required: 
functional  testing  of  the  first  chips,  internal  probing  to  determine  why  the 
first  chips  did  not  work,  functional  testing  of  the  second  set  of  chips  and 
finally  high  speed  testing  of  the  second  set  of  chips^^. 

3.6.1  Generating  the  Clocks 

The  most  difficult  part  of  both  the  low  speed  and  high  speed  testing 
was'  generating  the  clock  signals.  In  both  cases,  the  four  clock  phases  were 
generated  by  splitting  2  offset  clock  phases  as  is  described  in  Section  3.4.3. 
For  the  low  speed  clocks,  High-Speed  CMOS  NOR  gates  and  NAND  gates 
were  used  to  generated  <t>i — 4>i  their  complements;  <f>i  and  ^3  were  used 

directly  as  the  control  clocks  without  doing  any  frequency  division  since  the 
clock  frequency  was  already  so  low.  For  the  high  speed  clocks,  ECL  NOR 
gates  were  used  to  generate  — ^4;  the  control  signals  were  generated  by 
dividing  down  with  a  4-bit  ECL  binary  counter  and  then  using  NOR 
gates  to  split  the  clock  into  <l>is  and  (f>zs  phases.  The  ECL  level  clocks 
were  converted  to  full  CMOS  levels  by  using  ECL-to-TTL  translators  with 


last  step  is  still  in  progress. 
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the  +5  Volt  power  supply  set  to  +6  Volts  in  order  to  achieve  almost  full 
CMOS  level  swings.  The  translators  were  designed  for  differential  inputs 
so  the  complementary  versions  of  the  clocks  were  generated  by  driving  the 
negative  true  input  of  the  translators  instead  of  the  positive  true  input. 

A  Tektronix  9100  DAS  logic  analyzer  was  used  to  provide  the  <f>A  and 
</>B  signals  for  the  low  speed  circuitry.  The  DAS  can  provide  output  signals 
up  to  a  clock  frequency  of  25  MHz;  by  making  one  cycle  of  (f>A  equal  to  8 
DAS  cycles  generating  the  offset  between  the  clocks  was  simply  a  matter  of 
offsetting  the  <j>D  output  from  the  4>a  output  by  2  cycles.  The  high  frequency 
versions  of  the  <t>A  and  clocks  were  generated  using  a  Tektronix  PG501 
250  MHz  pulse  generator  as  a  trigger  source  for  another  PG501  and  a  PG507 
50MHz  generator. 

The  most  questionable  aspect  of  the  technique  used  to  generate  the  high 
speed  clocks  was  the  ECL-to-CMOS  level  conversion  process.  There  are  no 
actual  ECL-to-CMOS  level  converters  explicitly  available  so  the  choice  was 
to  build  discrete  level  converters  or  try  and  use  the  ECL-to-TTL  translators 
that  are  available;  the  latter  approach  was  chosen.  Because  the  TTL  outputs 
will  only  pull  up  to  within  1 — 1.5  volts  of  the  HIGH  voltage  rail,  the  +5 
input  to  the  ECL-to-TTL  chips  was  set  at  approximately  6  Volts.  While 
the  low  output  level  of  the  translators  is  not  0  Volts,  it  was  low  enough, 
that  no  effort  was  made  to  adjust  it.  The  cross-coupled  NOR  gates  are 
only  guaranteed  to  produce  non-overlapping  signals  directly  at  the  outputs 
of  the  NOR  gates;  the  amount  of  non-overlap  time  is  roughly  one  gate  delay 
which  for  the  ECL  lOKH  family  of  gates  is  around  1.0 — 2.0  nanoseconds. 
The  specifications  on  the  propagation  delays  of  the  ECL-to-TTL  translaton 
is  from  1.0 — 3.6  nanoseconds.  Since  this  is  of  the  same  magnitude  as  the 
expected  non-overlap  time  some  care  must  be  taken  to  insure  that  the  clocks 
are  still  non-overlapping  after  the  level  conversion;  this  is  not  too  difficult 
since  there  are  4  translators  per  package  and  there  are  never  more  than  4 
signals  whose  edges  must  be  carefully  controlled^^. 

^^For  example,  although  and  ^3  and  ^15  and  ^35  must  be  non-overlapping,  the  relation 
between  and  ^3  is  not  critical  except  in  relation  to  the  noise  margin. 
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3.6.2  Input  and  Output  Signal  Generation  and  Sam¬ 
pling 

Besides  the  clock  signals,  very  few  other  signals  required  by  the  syn¬ 
chronizer:  one  data  input,  a  reset  signal  and  a  ChangeJSnable  signal.  The 
synchronizer  produces  only  2  outputs:  the  output  data  signal  and  the  Select 
signal. 

Once  again  the  slow  speed  case  was  straightforward  since  the  DAS  could 
be  used  to  generate  all  of  the  input  signals  and  sample  the  output  signals. 
The  generation  of  <f>B  only  required  a  4  DAS  cycle  long  period  for 

the  synchronizer  signals,  but  by  slowing  the  signals  down  to  8  DAS  cycles, 
the  phase  of  the  input  could  be  easily  controlled;  transitions  could  be  placed 
in  the  center  of  any  of  the  4  phase  windows. 

This  slower  signal  also  made  it  possible  to  tell  which  delay  line  was  being 
chosen  by  the  selection  circuitry  without  having  ^lccess  to  any  of  the  internal 
signals.  Figure  3.14  illustrates  the  series  of  4  short  pulses  that  was  used  to 
determine  which  delay  is  being  chosen;  the  important  characteristic  of  these 
signals  is  that  each  pulse  is  overlaps  one  of  the  falling  edges  of  the  sampling 
clocks.  In  the  Figure,  the  initial  data  pattern  was  transitioning  between  the 
falling  edges  of  ^3  and  ^4  but  it  is  impossible  to  tell  which  delay  is  being 
chosen  simply  by  looking  at  the  output  because  the  output  is  latched  by 
^3.  Because  the  short  pulses  will  only  be  seen  by  the  clocks  which  go  LOW 

i 

during  the  pulse,  only  one  of  the  4  pulses  will  be  seen  by  the  synchronizer' 
for  any  particular  choice  of  delay.  The  fact  that  ^2  was  being  chosen  for 
the  sampling  clock  is  reflected  by  the  output  going  HIGH  after  the  4>2  pulse 
but  not  after  any  of  the  other  3  pulses.  In  this  manner  the  operation  of 
the  synchronizer  could  be  fully  checked  at  low  speeds  without  needing  any 
internal  signals. 

The  DAS  was  not  fast  enough  to  provide  the  clock  signals  for  the  high 
speed  tests,  but  it  was  fast  enough  to  generate  the  data  and  control  signals. 
One  of  higher  order  bits  of  the  ECL  coimters  used  to  generate  the  offset 
clocks  was  used  as  an  external  clock  for  the  DAS;  this  provided  an  easily 
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Figure  3.14:  The  Data  and  Clock  Signals  Generated  Used  to  Verify  the 
Operation  of  the  Delay  Selection  Circuitry. 

controlled  source  of  signals  which  were  at  least  roughly  aligned  to  the  syn¬ 
chronizer  clocks.  The  phase  of  the  data  signal  could  not  be  varied  by  the 
DAS  with  this  technique  so  a  voltage  controlled  delay  line  was  used^*;  by 
varying  the  voltage  applied  to  the  delay  line,  the  phase  of  the  input  could 
be  varied  widely. 

3.6.3  Results:  Chip  Set  #1 

A  test  chip  containing  2  synchronizers  was  submitted  to  MOSIS  for  fab¬ 
rication  in  September  1984.  The  first  set  of  wafers  failed  the  acceptance 
tests  at  MOSIS  so  the  run  had  to  refabricated,  chips  from  the  second  run 
were  received  in  late  January  1985.  Once  a  simple  test  structure  was  put 
together,  only  a  few  days  were  required  to  determine  that  of  the  8  chips 
received  (16  synchronizers}  only  7  synchronizers  showed  any  signs  of  doing 
anything.  Of  those  7  only  4  seemed  to  be  close  to  working  properly  and  even 
these  4  seemed  to  only  be  able  to  choose  one  particular  delay  line  —  the  line 
whose  input  was  sampled  by  ^4. 


**Such  a  delay  line  just  happened  to  have  been  fabricated  on  the  test  chip. 
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The  next  3  weeks  to  a  month  were  spent  trying  to  determine  why  the 
chips  did  not  work;  this  process  required  using  microprobes  to  examine  in¬ 
ternal  nodes  within  the  synchronizer.  The  layout  of  the  synchronizer  had 
included  overglass  cuts  at  selected  places  inside  the  synchronizer  logic;  these 
cuts  were  to  serve  as  probe  openings  to  allow  looking  at  the  chosen  internal 
nodes  by  using  a  very  low  capacitance  probe.  Due  to  an  oversight  at  the 
time  of  the  layout,  the  internal  overglass  cuts  were  not  made  large  enough 
and  as  a  result  the  openings  did  not  get  cut  through  to  the  underlying  metal 
lines  during  processing.  Without  pre-cut  openings,  an  ultrasonic  cutter  had 
to  be  used  to  remove  the  overglass;  this  process  was  much  more  difficult  than 
it  appeared  and  a  number  of  initial  attempts  resulted  only  in  mined  chips 
and  probe  tips. 


When  internal  nodes  were  finally  examined,  an  interesting  phenomenon 
was  observed  at  the  output  of  one  of  the  buffers  driving  the  transition  de¬ 
tection  XOR  gates:  the  signal  was  only  remaining  HIGH  for  half  the  clock 
period.  This  pointed  to  a  dynamic  charge  storage  problem  at  the  outputs 
of  the  dynamic  latches;  in  order  to  confirm  this  idea,  the  leakage  current  of 
the  junctions  needed  to  be  observed  at  least  indirectly.  The  first  technique 
used  to  see  if  excess  leakage  was  occurring  was  to  examine  the  dc  standby 
current  of  the  chip  -  it  turned  out  to  be  on  the  order  of  milliamps  for  some 
of  the  chips.  Then  to  insure  the  diagnosis  was  right,  the  jimction  character¬ 
istics  were  examined  directly  using  the  probe  station.  The  input  protection 
resistor  and  diodes  were  used  as  sources  of  isolated  junctions  whose  char¬ 
acteristics  could  be  measured;  the  results  of  the  measurements  are  shown 
in  Figure  3.15.  The  figure  plots  the  junction  current  vs.  junction  voltage 
where  a  positive  voltage  should  forward  bias  junction  and  ttim  on  the  junc¬ 
tion  diode  while  a  negative  voltage  should  reverse  bias  the  diode.  The  plot 
shows  that  the  reverse  biased  p-f  diffusion- to- substrate  junctions  were  ex¬ 
tremely  leaky.  This  was  accepted  as  enough  explanation  of  why  the  first 
chips  did  not  work. 
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Junction  Bias  Voltage 

Figure  3.15:  Tlie  Current- Voltage  Characteristics  of  the  PN  Junctions  of  the 
First  Test  Chips. 


3.6.4  Results:  Chip  Set  #2 

A  second  test  chip  had  also  been  submitted  for  fabrication;  the  chip  was 
actually  a  test  chip  for  some  new  pads  but  a  synchronizer  was  added  as  an  af¬ 
ter  thought.  The  functional  testing  of  the  second  chips  went  much  smoother; 
within  2  days  of  receiving  the  chips  their  full  functionality  had  been  verified. 
The  synchronizer  made  the  proper  delay  selection  for  all  phases  of  the  in¬ 
put  signal  and  when  the  phase  was  changed  the  synchronizer  responded  by 
changing  the  delay  selection  properly. 

The  high  speed  testing  of  these  functional  chips  is  still  proceeding. 


3.7  Conclusions 

This  thesis  has  presented  a  circuit  based  synchronization  technique  de¬ 
signed  to  allow  high  speed  data  transmission  directly  between  MOS  chips 
in  a  synchronous  system  without  a  detailed  aniilysis  of  the  actual  delays 
involved.  The  technique  provides  phase  jitter  immtmity  of  close  to  1/4  of 
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a  clock  period;  greater  immunity  is  available  at  the  expense  of  more  clock 
generation  circuitry.  An  implementation  of  the  technique  presented  has  been 
fabricated;  test  results  have  verified  the  functionality  of  the  synchronizer  and 
tests  to  determine  the  speed  limits  of  the  synchronizer  are  continuing.  An  es¬ 
timate  of  the  area  requirement  indicates  that  the  chip  area  could  be  reduced 
enough  to  make  the  overhead  associated  with  the  synchronizer  acceptable. 


Appendix  A 


Linear  Analysis  of  the  DDA 
Flip-Flop 


A  number  of  studies  have  examined  bistable  latches  implemented  in  tech¬ 
nologies  ranging  from  bipolar  MSI  and  LSI  [4]  to  NMOS  LSI  and  VLSI  im¬ 
plementations.  However,  up  to  this  point  no  studies  have  been  found  in  the 
literature  which  considered  CMOS  bistable  latclies.  This  section  will  analyze 
the  pai'ticular  type  of  latch  used  in  the  CMOS  implementation  of  the  DDA 
synchronizer. 

Figure  A.l  shows  the  gate-level  diagrams  of  two  types  of  bistable  latches; 
(a)  is  probably  the  most  commonly  hnplemented  form  in  bipolar  and  NMOS 
technologies  while  (b)  is  the  type  chosen  for  use  in  the  DDA  synchronizer. 
The  latches  operate  in  similar  manner  and  have  similar  characteristics  with 
regard  to  failures;  the  discussion  in  this  Appendix  will  be  oriented  towards 
the  DDA  latch  with  comments  concerning  the  other  type  of  latch  where  per¬ 
tinent.  There  are  a  number  of  different  ways  to  implement  bistable  latches; 
the  choice  made  for  this  implementation  wzis  not  completely  arbitrary  but  a 
more  thorough  analysis  of  the  alternatives  could  produce  a  latch  with  more 
desirable  characteristics. 
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A.l  Bistable  Latch  Operation 

Figure  A.l(c)  shows  the  timing  diagram  of  the  DDA  latch  under  normal 
operating  conditions.  When  <f>i  is  HIGH,  the  transition  on  the  Data  input 
causes  a  corresponding  transition  on  the  Data-bar  signal.  Together,  Data 
and  Data-bar  tend  to  force  the  outputs  of  the  tristate  latches  to  switch; 
in  order  to  switch,  the  tristate  latches  must  overcome  the  cross-coupled 
inverters  which  are  trying  to  hold  the  latch  in  the  previous  state.  It  is 
this  fighting  between  the  tristate  latches  and  the  cross-coupled  inverters, 
that  differentiates  the  DDA  latch  from  the  other  latch  in  which  the  AND 
gates  will  naturally  switch  the  latch  regardless  of  relative  strengths  of  the 
devices^.  Once  4>i  has  gone  LOW,  the  cross-coupled  inverters  are  isolated 
and  will  settle  to  one  of  the  latch’s  stable  states.  The  latch’s  outputs  must 
be  at  valid  levels  before  ^2  goes  LOW  in  order  to  prevent  illegal  logic  levels 
from  propagating  through  the  logic  which  use  the  latch’s  outputs. 

Figure  A.2  plots  the  static  transfer  characteristics  of  the  latch’s  cross- 
coupled  inverters.  The  plot  show  how  the  feedback  present  in  the  latch 
results  in  a  circuit  with  3  stable  states: 


•  Vg  =  0  and  Vg-tar  =  VoDt 

•  Vg  =  VjjD  and  Vg_6or  = 

•  Vg  =:  ~  V'n** 

The  first  two  states  are  the  logically  legal  states  of  the  latch  while  the 
third  state  is  the  illegal  metaatabU  state.  If  the  circuit  is  exactly  in  the 
metastable  state,  it  will  remain  there,  but  if  Vg  and  Vg-hir  ate  displaced  even 
an  €  from  equality,  the  voltage  difference  will  be  amplified  by  the  feedback 
and  the  circuit  will  eventually  settle  in  one  of  the  logically  legal  states.  As 
will  be  shown  shortly,  this  settling  may  take  an  arbitrarily  long  time. 

^Of  course  ill  NMOS  Iatc.'ics,  the  devices  must  be  sized  correctly  in  order  to  achieve  the 
proper  logic  voltages  but  that  sizing  is  really  a  separate  issue. 
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Figure  A.2:  The  Transfer  Characteristics  of  the  Bistable  Latch 

A. 2  Predicting  the  Failure  Rate  of  CMOS 
Bistable  Latches 

The  analysis  of  the  CMOS  bistable  latch  will  consist  of'two  stages  which 
reflect  the  two  different  operating  modes  of  the  latch.  When  a  transition  on 
the  data  input  begins  propagating  through  the  latch,  the  MOSFETs’  oper¬ 
ating  characteristics  vary  greatly  due  to  the  nonlinear  nature  of  the  devices; 
predicting  the  behavior  the  circuitry  in  this  nonlinear  operating  regime  re¬ 
quires  fairly  sophisticated  modeling  methods.  If  a  clock  edge  manages  to  put 
the  latch  into  a  metastable  state,  the  circuitry  will  remain  in  the  vicinity  of 
the  Litch’s  mctastable  point  for  a  significant  period  of  time;  during  this  time 
the  circuitry  can  accurately  be  described  as  being  linear  and  time  invariant. 

The  first  step  of  the  analysis  will  use  a  circuit  sunulator  to  model  the 
operation  of  the  latch  when  the  clock  and  data  transition  try  to  force  the 
latch  into  the  mctastable  state;  from  these  simulations  the  relationship  of 
the  phase  margin  between  the  clock  and  data  edges,  t/>M,  to  the  initial 
differential  voltage  latched  at  the  outputs  of  the  bistable  latch,  vj(to)  will  be 
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determined.  The  second  step  will  be  use  a  small  signal  model  of  the  latch 
to  study  the  amplification  of  The  results  of  these  two  steps  can  then 

be  combined  to  predict  the  probability  that  latch  failures  will  occur. 

A.2.1  Simplifying  Assumptions 

Definition  of  Latch  Failure 

The  discussion  of  noise  and  phase  margins  in  Section  2.5  defined  latch 
failure  as  the  failure  of  the  outputs  voltages,  Vq  and  Vg_4ar»  to  reach  the 
logic  thresholds  which  can  be  assumed  to  be  equal  to  the  thresholds  of  the 
n-  and  p-type  devices  with  no  loss  of  generality.  This  definition  of  latch 
failure  is  more  conservative  than  really  necessary  and  will  also  caiise  some 
problems  with  the  linear  analysis  of  the  latch  so  a  slightly  less  conservative 
definition  will  be  used  in  this  Appendix.  The  outputs  of  the  dynamic  latches 
which  latch  Vq  and  Vq.bar  on  ^  will  be  used  to  determine  the  failure  criteria. 
Further,  the  assumption  will  be  made  that  as  the  static  latch  settles  out  of 
a  metastable  state,  Vq  and  Vq-har  will  be  changing  much  slower  than  the 
dynamic  latches  can  switch.  If  Hij{Vq)  is  the  DC  transfer  characteristic  of 
the  dynamic  latches  then  as  the  bistable  latch  is  moving  towards  a  stable 
state,  it  can  safely  be  assumed  that  Voata  =  ■^fr»(V<j_6or). 

Defining  Vm  to  be  the  voltage  at  which  =  Vt„,  a  latch  failure 

can  be  defined  to  occur  when  neither  Vq  nor  Vq.bar  are  greater  than  Vm 
when  the  outputs  are  latched  at  T5.  This  conforms  roughly  to  the  definition 
of  latch  failure  used  in  [15]  and  seems  to  be  a  fairly  reasonable  definition.  A 
simulation  of  the  static  characteristics  of  the  dynamic  latch  gives; 

Vm  =  2.85 

Latched  Inputs 

The  inputs  to  the  DDA  synchronizer  are  completely  asynchronous;  the 
inputs  to  the  bistable  latches  on  the  other  hand  are  all  latched  by  dynamic 
latches.  Latching  the  inputs  to  the  bistable  latches  will  limit  the  types  of 
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trcansitions  thc  bistable  latches  sec;  specifically,  the  iiipxit  to  the  latches  will 
not  change  for  Tc/2  seconds  before  the  bistable  latches’  clock  goes  LOW. 
This  stability  lowers  the  probability  of  the  latch  failing  but  does  not  prevent 
the  bistable  latches  from  failing;  the  input  could  be  held  at  a  voltage  which 
will  place  the  bistable  latch  in  a  metastable  state.  The  exact  charcictcristics 
of  the  latch  arc  sensitive  to  the  relationship  between  the  clock  signals  clocking 
the  dynamic  and  static  latches;  this  adds  another  dimension  to  the  analysis. 
The  initial  analysis  of  the  latch  will  assume  the  input  to  the  bistable  is  not 
clocked  in  order  to  simplify  the  analysis. 

Symmetry 

Although  the  latch  in  the  DDA  synchronizer  only  utilized  one  output  and 
was  therefore  very  asymetrical  ,  the  latch  in  the  emalysis  will  be  assumed  to 
be  symmetrically  designed  and  loaded.  This  assumption  simplifies  the  small 
signal  analysis  of  the  latch  greatly  without  sacrificing  much  insight.  Some 
accuracy  is  obviously  sacrificed  but  the  analysis  should  give  conservative 
results  since  the  loading  on  the  outputs  was  taken  to  be  equal  to  the  largest 
load  seen  by  the  DDA  synchronizer’s  latch.  The  asymmetrical  latch  can  be 
analyzed  cuid  a  simple  analysis  is  given  in  Section  A. 2. 5;  in  order  to  achieve 
results  that  can  be  interpreted  reasonably,  at  least  as  much  accuracy  was 
simplified  as  in  the  symmetrical  analysis. 

Noise 

The  effect  of  noise  on  the  operation  of  synchronizers  has  been  studied 
by  several  authors[l5][8][12|.  The  types  of  noise  considered,  the  techniques 
used  and  the  assumptions  the  authors  made  varied  somewhat.  All  of  the 
studies  modeled  the  noise  as  being  purely  random  with  a  normal  probability 
distribution,  zero  mean  amplitude  and  rms  values  small  enough  that  the 
linear  small  signal  model  was  still  valid.  Noise  of  this  type  arises  from 
different  sources;  the  most  prevalent  sources  of  noise  are  the  active  devices 
themselves;  MOS  devices  produce  a  significant  amount  of  thermal  and  shot 
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noise  both  of  \yhich  produce  normally  distributed  noise  c\irrents[28].  There 
are  also  other  sources  of  noise,  some  can  be  considered  small  signal  while 
other  must  be  treated  as  large  signal  disturbances;  only  noise  which  can 
be  reasonably  modeled  in  a  maimer  similar  to  the  thermal  and  shot  noise 
will  impact  the  considerations  of  this  section.  Noise  sources  which  are  more 
deterministic  and  have  larger  amplitudes  are  discussed  in  Section  2.5. 

Two  of  the  studies  analyzed  the  noise  effects  in  a  fairly  thorough  analyt¬ 
ical  manner [15]  [8];  the  other  studies  used  more  empirical  methods  including 
graphical  and  textual  arguments.  The  conclusion  all  of  the  studies  have 
drawn  is  that  noise  does  not  effect  the  probability  that  a  s}rnchronizer  fail¬ 
ure  will  occur.  The  conclusions  can  be  summed  up  by  the  argument  that  if 
the  noise  amplitudes  are  truly  random,  then  when  the  synchronizer  is  trying 
to  escape  from  the  metastable  region,  the  probability  that  noise  will  force 
the  latch  back  into  the  metastablc  region  is  equal  to  the  probability  that 
noise  will  force  the  latch  out  of  the  metastable  region. 

A.2.2  Simulation  of  the  CMOS  Bistable  Latch 

Most  of  the  past  studies  'have  used  some  type  of  linear  approximation 
to  predict  Vd(to);  while  the  inaccuracies  introduced  by  the  approximations 
are  not  significant  enough  to  invalidate  the  results,  no  great  insight  into  the 
problem  is  lost  by  using  a  simulator  to  make  more  accurate  predictions. 

The  simulator  used  for  these  simulations,  SPICE2G.5,  effectively  models 
many  of  the  second  order  effects  which  arc  important  in  accurately  predicting 
the  performance  of  MOS  circuits.  The  models  used  by  SPICE  are  fairly 
complicated  and  their  accuracy  depends  heavily  on  how  well  the  parameter 
values  used  match  the  specific  process. 

Many  of  the  parameters  of  the  models  are  empirical  in  nature  and  a 
common  procedure  for  determining  the  parameters  to  use  is  to  make  a  series 
of  measurements  on  test  structures  that  have  been  fabricated  using  the  target 
process  and  then  perform  a  global  curve  fittmg  operation  to  match  SPICE 
predictions  to  the  observed  behavior.  This  procedure  results  in  the  best 
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Figure  A.3:  SPICE2.G5  Simulation  of  the  CMOS  Bistable  Latch 


set  of  parameter  values  but  the  actual  values  may  not  have  any  intuitively 
obvious  relationship  to  the  processing  details.  The  SPICE  models  used  for 
the  simulations  in  this  thesis  were  extracted  in  this  manner  by  the  MOSIS 
fabrication  service[6]. 

Figure  A.3  shows  the  simulation  of  the  bistable  latch  for  one  particular 
phase  relationship  between  the  clock  and  data  edges.  In  the  simulation 
shown,  the  rising  input  causes  the  Q'bar  output  and  the  Data-bar  input  to 
begin  falling.  The  Data-bar  input  lags  the  Data  input  by  an  inverter  delay 
and  therefore  the  Q  output  does  not  start  rising  until  the  Q-bar  output  has 
fallen  to  almost  Vr,.  This  behavior  is  slightly  different  from  the  behavior 
caused  by  falling  input  transitions  which  violate  the  setup  requirements  of 
the  latch;  Figure  2.3(b)  shows  how  a  falling  transition  causes  the  outputs 
to  be  latched  much  nearer  the  metastable  point  of  the  latch.  The  analysis 
in  this  Appendix  does  not  depend  on  what  type  of  transition  causes  the 
metastable  states  to  appear;  all  disettssions  will  use  the  rising  transition 
response  shown  in  Figme  A.3  as  a  reference  example. 

In  this  analysis,  all  time  measurements  will  be  related  to  the  i)oint  at 
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which  the  clock  input  is  equal  to  this  time  will  be  referred  to  as  to. 
The  phase  margin  of  the  input  signal  will  still  be  determined  by  the  VddI2 
points  of  the  Data  and  Clock  signals.  The  change  in  time  reference  for  the 
latch  responses  should  not  affect  the  outcome  of  the  analysis. 

After  to,  the  two  cross-coupled  inverters  arc  isolated  by  the  high-imped¬ 
ance  outputs  of  the  tristatc  latches^.  In  the  absence  of  noise,  the  latch’s 
response  and  final  state  are  completely  dependent  on  VqIIq)  and  Vg.  iar(to)- 

Since  Vg(fo)  and  VQ-bar{to)  are  both  close  to  Vt„  the  n-type  transistors 
are  both  almost  cutoff.  As  a  result,  Q  and  Q-bar  begin  to  be  pulled  HIGH, 
at  identical  speeds,  by  the  p-type  pullups. 

As  Vq  and  increase,  the  n-type  pulldowns  gradually  turn  on,  initi¬ 

ating  the  regenerative  action  of  the  cross-coupled  inverters.  The  m<iximum 
loop  gam  is  obtained  at  the  metastable  point  of  the  latch  but  the  gain  will  be 
close  to  maximum  as  long  the  operating  point  is  within  a  threshold  or  so  of 
the  Vinv't  f^be  critical  factor  is  that  the  all  of  the  transistors  remain  saturated 
and  this  will  be  the  case  as  long  as  \Vq  —  Vg-iarl  <  Vy. 

When  Vq  and  Vg-ter  both  rise  to  the  vicinity  of  the  l^n«  point  of  the 
inverters,  the  circuit  is  very  nearly  balanced  at  its  metastable  point.  Unless 
Vq  and  Vg.^ir  ^u-e  exactly  equal,  the  feedback  present  in  the  latch  will  amplify 
the  differential  voltage,  forcing  the  latch  to  stable,  logically  legal  states.  The 
plot  seems  to  indicate  that  the  amplification  of  Vg  and  Vg-(ar  to  stable  states 
is  exponential  in  nature;  this  fact  will  be  confirmed  by  the  linear,  small  signal 
analysis  in  the  next  section. 

For  the  particular  arrangement  of  the  clock  md  data  signzds  simulated, 
the  latch  settles  to  the  state  Vg  =  0  and  Vg-6or  =  V^  which  happens  to  be 
the  state  of  the  latch  before  the  transitions  occurred.  Had  the  data  transition 
occurred  50  picoseconds  earlier,  this  outcome  would  have  been  reversed.  By 
performing  a  series  of  simulations  in  which  the  data  transition  point  varied 
by  only  a  few  picoseconds,  the  characteristics  of  the  latch  in  the  critical 
switching  region  can  be  determined.  Simulations  were  performed  with  a 

*A8sammg  clock-bar  rises  above  Vr,  at  tg  also. 
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range  of  phase  margins  and  both  falling  and  rising  Data  signals;  Figure  A.4 
plots  the  initial  differential  voltage,  vaito)  =  ^^(to)  -  VQ.*qr(to)  versus  the 
phase  margin,  tpu> 

For  initial  differential  voltages  of  a  few  hundred  millivolts  or  less,  the 
dependence  of  i;rf(to)  on  tp^,  Udo(i/»A/)i  can  be  approximated  by  a  linear 
relationship  having  a  slope  of  S  and  a  zero  point  of  tpM  = 

Vdo{tpM)  =  S{tpM  -  iAt)  (■^•1) 

Falling  and  rising  inputs  will  produce  linear  relationships  with  different 
slopes,  Sp  and  Sr,  and  different  zero  points,  and  The  simulations 
indicated  that 

Sp  =  -3.9  V/ns  tur  =  3*5  ns 

and 

5n  =  2.8  V/ns  =  2.7  ns 

These  relationships  between  Vd[to)  and  Tpm  make  it  possible  to  develop 
a  simple  expression  for  Pv{vd{ta)  <  «,),  the  probability  that  uj(fo)  will  be 
less  than  some  value,  v,.  Due  to  the  asynclironous  nature  of  the  input  the 
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data  transitions  can  be  assumed  to  have  an  equal  probability  of  occurring 
at  any  time,  i.e.  the  probability  density  of  both  rising  and  falling  transitions 
will  be  Pd  =  I/Td  where  To  is  the  average  period  between  falling  or  rising 
transitions  and  To  >  2Tc.  The  probability  P|,(urf(to)  <  ^m)  found  by 

adding  the  probabilities  arising  from  falling  and  rising  trzinsitions  which  can 
in  turn  be  found  from  the  slopes  Sr  and  Sr: 

p.(..(<o)  < «,)  =po  ((^)  +  (^))  =  +  ^)-  (-<■*) 

A.2.3  Linear  Analysis 

When  the  bistable  latch  is  placed  very  close  to  its  metastable  state,  as 
was  the  case  in  the  simulation  illustrated  in  Figure  A.3,  the  differential 
voltage  may  easily  remain  less  the  a  threshold  for  a  significant  period  of  time. 
During  this  period,  all  4  devices  which  constitute  the  cross-coupled  inverters 
are  operating  in  their  saturation  regimes.  In  this  region  the  variations  in 
the  device  characteristics  will  be  small  enough  that  modeling  the  devices  by 
their  linearized  small  signal  models  is  justified. 

Linearized  Small  Signal  Models 

Figure  A.5  illustrates  the  small  signal  models  used  for  the  n-  and  p- 
type  transistors.  The  model  consists  of  a  voltage  controlled  current  source, 
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an  output  impedance  and  5  capacitances.  The  transconductances,  gm„  and 
and  output  impedances,  go„  and  go^,  can  be  found  by  linearizing  the  re¬ 
lationship  between  drain  current  and  the  node  voltages  around  the  operating 
point: 


dVas^  ^  dVe 

(A.S) 

9IsD,  dIsDp 

"  dVsGp  ~  dVa 

{A.4) 

_  dloSn 
.  avbs. 

(.4.5) 

II 

.  (^-6) 

The  capacitance  model  of  the  MOSFET  is  simplified  slightly  when  the 
device  is  saturated.  Both  the  channel-to-substrate  and  the  gate-to-drain 
capacitances  can  be  neglected  and,  in  saturation,  the  gatc-to-source  capac¬ 
itance  is  Cos  =  The  source  and  drain  capacitances,  Csb  and 

CoBt  are  the  nonlinear  capacitances  associated  with  the  diffusion  area  of 
the  source  and  drain;  these  can  be  determined  by  extracting  the  diffusion 
areas  and  perimeters  from  the  layout.  The  other  parasitics  and  load  capaci¬ 
tances  can  be  determined  by  extracting  the  layout  in  a  similar  manner.  Due 
to  the  cross-coupling  nature  of  the  circuit,  there  may  also  be  a  capacitance 
coupling  the  Q  and  Q  —  bar  nodes  together. 

Figure  A.6  shows  the  resulting  small  signal  model  for  the  cross-coupled 
inverters.  The  symmetrical  nature  of  the  latch’s  devices  and  loads  results 
in  a  symmetrical  small  signal  model  also.  The  capacitances  shown  represent 
the  combination  of  all  of  the  device,  load  and  parasitic  capacitances. 

KCL  equations  can  be  written  for  the  2  nodes  in  the  small  signal  model 
in  terms  of  vq  and  VQ-bar‘ 


-f  Cc)  -  Vq^iarCc  +  +  ffo„)  +  VQ-har[9m,  +  9mp)  =  0  (A.7) 

^Q-har[Ci  +  Cc)  -  VqCc  +  V^-4or(&i^  +  9o„)  +  +  9m„)  =  0 
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Figure  A'.6:  Small  Signal  Model  for  the  CMOS  Bistable  Latch 


Equations  A.7  and  A. 8  can  be  combined  to  determine  Vc{t)  =  vq^t)  + 

^Q-bar{t)  and  va{t)  =  VQ{t)  —  u<3-6ar(0>  common-mode  and  differential 
signals  seen  by  the  latch. 


^eCt  +  Vcigo,  +  5o„  +  +  ffm,)  =  0  (A.9) 

Vd{Ci  +  2Cc)  +  Va{g^  +  g^  -  =  Q.  (A.IO) 

Equations  A.9  and  A.  10  produce  the  common-mode  and  differential  re¬ 
sponses  of  the  metastable  latch: 


=  We(«o)exp 

=  Vc(io)ca:p 


~  (gon  +gop  +9m„+gmv) 


Cl 


-(<  -  ‘o)sl 


(‘  -  <0)] 


(A.n) 


Vd(0  =  M^o)  exp 


-  ‘o)] = ”.(«.)  [(<  -  to)i] .  (^.12) 

These  solutions  predict  that  the  common-mode  signal  will  decay  expo¬ 
nentially  while  the  differential  signal  will  grow  exponentially®;  these  predic¬ 
tions  fit  well  with  the  behavior  illustrated  in  Figure  A.3. 


Estimation  of  the  Parameters 

In  order  to  predict  the  latch’s  behavior  quantitatively  as  well  as  qualita¬ 
tively,  the  parameters  of  the  small  signal  model  must  be  determined.  The 

^Provided  >  g„^  +  e.g,  the  inverter  gain  must  be  greater  than  1;  this  is 

already  the  case  for  restoring  digital  logic  circuits[12]  so  this  restriction  is  not  important. 
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capacitances  can  be  found  by  an  extraction  of  the  layout  of  tlie  latch;  since 
the  actual  latch  used  was  asymmetrical,  the  largest  loading  predicted  by 
the  circuit  extractor  will  be  used  for  Ci,.  The  worst  case  load  capacitance 
comes  out  to  be  Ci  =  560  x  10“  but  there  is  no  coupling  capacitance 
due  to  the  way  in  which  the  cross-coupled  inverters  were  layed  out  and  to 
the  assumption  that  Cod  =  0  for  the  saturated  transistors. 

Rather  than  try  and  perform  the  linearization  of  the  current  relationships 
by  hand  the  g^'s  and  go's  were  obtained  from  the  operating  point  information 
provided  by  SPICE;  the  brief  discussion  of  some  of  the  second  order  effects 
present  in  MOSFETs  given  in  the  next  Section  should  make  it  apparent  why 
SPICE  was  used  for  this  purpose.  When  the  inverters  were  placed  at  the 
metastable  point  of  the  latch,  the  small  signal  parameters  were  found  to  be: 

gm^  =  6.74  X  10-*  >l/F 

gm,  =  4.13xlO-»A/F 

Po.  =  2.45  X  10-® 

go,  =  5.43  X  10-®  . 

Given  these  parameters,  the  time  constants  of  the  common-mode  and 
differential  signals  can  be  determined: 

frf  =  5.6  nsec 

Te  =  4.8  nsec. 

These  time  constants  are  somewhat  larger  than  was  really  expected.  The 
type  of  bistable  latch  chosen  required  the  devices  in  the  inverters  be  small 
in  order  for  the  dynamic  latches  to  be  able  to  overpower  the  inverters.  The 
small  inverter  devices  coupled  with  a  rather  sizable  load  of  over  |  picoFarad 
resulted  in  a  large  RC  time  constant.  Even  so  the  settling  time  required 
to  achieve  the  extremely  low  failure  probability  was  not  long  enough  to 
have  a  serious  impact  on  the  performance  of  any  the  system  which  uses  the 
synchronizer.  The  shorter  settling  time  produced  by  an  improved  bistable 
latch  design  could  help  reduce  the  area  requirements  of  the  synchronizer 
by  increasing  the  number  of  synchronizers  that  could  share  a  single  set  of 
control  logic. 
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Second  Order  Effects  in  MOSFETS 

The  classical  model  of  a  MOSFET  operating  with  (Vg5  —  Vt)  <  Vjjs 
predicts  a  quadratic  dependence  of  on  (Vgs  —  Vt)  but  no  dependence 
on  the  voltage  at  the  drain.  There  are  several  second  order  effects  which 
have  noticeable  effects  on  the  current- voltage  relationships,  especially  for 
devices  with  short  channels  lengths.  Three  of  these  second  order  effects  will 
be  discussed  here  in  order  to  give  a  feeling  for  how  these  effects  impact  the 
analysis  of  the  static  latch:  drain  induced  barrier  lowering,  carrier  velocity 
saturation  and  channel  length  modulation.  A  thorough  discussion  of  these 
and  other  characteristics  is  given  Chapter  2  of  [12]. 

In  short  channel  devices,  the  drain  region  acts  like  a  second  gate  with  a 
much  thicker  oxide;  as  the  drain  voltage  rises  above  the  substrate  potential 
(or  drops  below  it  for  p-type  devices)  the  voltage  on  the  drain  tends  to  in¬ 
duce  charges  into  the  channel  in  the  same  manner  the  gate  does,  effectively 
lowering  the  threshold  of  the  device.  This  characteristic  is  modeled  by  in¬ 
troducing  a  feedback  term  reflect  the  drain  voltage  dependence  of  the  device 
thresholds: 

Vk  =  -  OnVo 

~  ~  ^P^SD  ~  Vt^  —  Cp{VDo  —  Vj}). 

Accounting  for  velocity  saturation  effects  is  more  involved.  The  simple 
model  for  the  channel  current  models  the  velocity  of  the  carriers  in  the 
channel  as  being  directly  proportional  to  the  electric  fields  present  in  the 
channel  region.  This  relationsliip  does  hold  for  moderate  strength  fields;  at 
higher  field  strengths  the  velocities  reach  a  maximum  velocity,  v^AXt 
any  further  increases  in  the  electric  field  do  not  increase  the  speed  of  the 
carriers.  In  the  classical  model  the  electric  field  in  a  saturated  device  was 
limited  to  Vjjsat  =  (^cs  —  Vt)/L;  this  dependence  of  the  electric  field  on 
(^cs  ~  ^r)  coupled  with  a  similar  dependence  of  the  charge  in  the  channel 
produced  the  quadratic  dependence  of  the  current  on  {Vqs  ~  Vy).  Once  the 
carriers  our  saturated,  the  drain  current  will  only  be  linearly  dependent  on 
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At  low  6eld  levels,  the  mobility  of  electrons  is  much  higher  than  that  of 
holes  and  as  a  result  the  electrons  travel  at  higher  velocities;  the  saturation 
velocities  of  electrons  and  holes  are  much  closer  together  than  the  low-field 
mobilities  so  the  performance  of  p-type  devices  is  much  closer  to  that  of 
n-type  devices  when  velocity  saturation  is  a  major  factor. 

A  third  effect  which  is  important  in  saturated  MOSFETs  is  the  depen¬ 
dence  of  the  effective  channel  length  on  the  drain  voltage.  If  the  poten¬ 
tial  in  the  channel  of  a  saturated  MOSFET,  ^5,  were  measured  starting 
at  the  source  and  progressing  towards  the  drain,  the  potential  would  in¬ 
crease  continuously  until  4>s  =  Vdsat-  At  the  point  in  the  channel  at  which 
<i>s  =  Vos  AT  >  charge  density  would  have  decreased  to  the  point  that  the 

channel  was  pinched  off]  any  further  Increases  in  ^5  would  cutoff  the  current 
flow.  A  depletion  region  forms  between  the  pinch  off  point  and  the  drain  of 
the  device;  the  excess  drain  voltage,  Vos  ~  Vos  at  t  appears  as  the  voltage 
across  the  depletion  region.  As  Vos  increases,  the  voltage  across  the  deple¬ 
tion  region  will  also  increase  causing  the  width  of  the  depletion  region  to 
increase  also.  Any  increase  in  the  width  of  the  depletion  region  shortens  the 
effective  length  of  the  device  resulting  in  an  increase  in  the  current. 

This  channel  length  modulation  effect  is  modeled  by  substituting  an  ef¬ 
fective  length  for  the  actual  length: 

L’^L-A. 

There  are  a  number  of  ways  to  model  A  with  both  empirical  and  physical 
models  of  varying  complexity;  one  complicating  factor  is  that  A  is  dependent 
on  L'  and  L'  is  dependent  on  A. 

A.2.4  Failure  Probability 

This  section  will  combine  the  results  generated  in  the  previous  section 
to  obtain  an  equation  for  predicting  the  probability  of  failure  of  the  bistable 
latch  used  in  the  DDA  synchronizer  implementation.  A  synchronization 
error  was  defined  previously  as  occurring  when  neither  Vg  nor  Vg-tar  are 
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greater  than  Vm  by  the  end  of  the  settling  time.  Given  the  eissumption  of  a 
balanced  response,  a  failure  can  be  defined  as  occurring  when  vji{Ts)  <  vm 
where  vm  =  2(Vm  -  Vi„„).  The  probability  of  an  error  occurring  is  given  by 
the  integral  of  the  probability  density  function  for  the  differential  voltage  at 
the  end  of  the  settling  time  over  the  range  of  metastable  voltages: 


p,{vj,Ts)dvj.  (A.13) 

The  response  of  the  latch  to  an  initial  differential  voltage  given  by  Equa¬ 
tion  A.12  indicate  that  in  order  for  a  failure  to  occur,  the  initial  differential 
voltage  would  have  to  be 


in  order  for 


<  VA/CXP 


Vd{Ts)  <  Vm. 


(A14) 


The  probability  density  at  the  end  of  the  settling  time  is  therefore  directly 
dependent  on  the  probability  density  at 

Ts^ 


Pp(vd,Ts)  =  Pvivjexp 


Td 


,to). 


(i4.15) 


The  results  of  Section  A.2.2  indicate  that  the  probability  density  of  Vd{to) 
for  small  differential  voltages  is  linear  and  can  be  estimated  as: 


Pv{Vd,to)  =  ^ 


(A.16) 


Combining  Equations  A.13,  A.15  and  A.16  yields  the  probability  of  the 
failure  in  terms  of  the  settling  time  allowed,  T5,  or  the  settling  time  required 
to  achieve  a  given  Pm- 


(A.17) 


(A.18) 
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The  specification  for  the  maximum  failure  probability  in  the  high  speed 
multiprocessor  was  Pf  <  10  this  was  assuming  a  lOOMHz  clock  which  is 
about  twice  as  fast  as  the  3  micron  chip  can  operate  but  a  factor  of  2  in  the 
Pm  specification  is  negligible.  Assuming  Tx>  =  =  20  x  10~^  seconds: 


Ts  =  -5.6  X  10"®  In 


IQ-”  X  20  X  IQ-® 

2  X  0.68  (j  oxiO*  2.8x10*) 


=  240  nsec. 


In  order  to  meet  the  failure  probability  specification,  the  dynamic  latches 
would  have  to  given  240  nanoseconds  or  12  clock  cycles  to  settle. 


A.2.5  Solutions  for  Non- Symmetrical  Loads 

Solutions  developed  in  Section  A.2.3  gave  expressions  for  the  common¬ 
mode  and  differential  gain  for  a  symmetrically  loaded  latch.  In  the  config¬ 
uration  implemented  in  the  DDA  synchronizer,  only  the  Q-bar  output  was 
utilized  resulting  a  asymmetric  loading  on  the  latch  outputs.  The  results 
must  be  extended  in  order  to  accurately  predict  the  effect  of  the  asymmetry 
on  the  latch’s  performance. 

In  order  to  simplify  matters,  two  assumptions  will  be  made.  First>  the 
coupling  capacitance,  Cct  will  be  ignored^.  Secondly,  the  effect  of  drain 
induced  barrier  lowering  on  the  device  thresholds  will  be  ignored.  These  two 
assumptions  greatly  simplify  the  problem,  hopefully  without  sacrificing  too 
much  in  accuracy. 

The  small  signal  model  resulting  from  these  simplifications  is  shown  in 
Figure  A.7.  The  dynamic  performance  of  this  circuit  can  be  described  by 
the  following  equation: 

^  ]=f  »  +  -9  ].  (^.19) 

VQ-tar  +  ffnt,)  0  J  [ 

The  solutions  to  this  equation  will  be  of  the  form 

VQ(t)  =  ki  exp[Aif]  -h  A2  exp[A2t]  (i4.20) 

*The  small  signal  gain  could  be  used  to  ^vided  this  Miller  capacitance  into  its  equivalent 
capadtances  on  the  inputs  and  outputs  of  the  inverters. 
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Figure  A.7:  The  Simplified  Small  Signal  Model  of  the  DDA  Synchronizer’s 
Bistable  Latch 


VQ-barit)  =  h  exp[Ajt]  +  A4  cxpfAjtj. 
In  this  case  it  is  straightforward  to  show: 


(A21) 


Substituting  into  the  expressions  for  VQ(t)  and  pi^oduces: 


«o(0  =  5[(«<j(M~  V^«0-*ar(to))«q)[Ait] 

+  ((^0(^0)  +  ^§^*'Q-ior(*o))  WCp[-Ait]j 

»<j-*ar(0  =  j[“(«<?(«o)-v^v<?-*«-(«o))exp|A,t] 

+  ((^q(*o)  +  ^^VQ-*ar(to))  ®{p[-Ait]j 


(A.22) 


(A.23) 


The  behavior  predicted  by  Equations  A.22  and  A.23  could  almost  have 
been  predicted  simply  extrapolation  from  the  responses  predicted  for  the 
symmetrical  load  situation.  The  responses  of  vg  and  vg-tar  consists  of 
a  common-mode  response  and  a  differential  mode  response  as  evidenced  by 
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the  presence  of  (t>(}(to)  -UQ-6or(io))  Mid  (^^(to)  +  i^0-6or(^o))  factors  in  both 
responses.  Further,  the  differential  portion  is  growing  exponentially  while 
the  common-mode  portion  is  decaying  exponentially.  The  nonsymmetry 
of  the  loading  is  reflected  by  the  scaling  factor  which  will  cause  the 
response  of  the  more  heavily  loaded  node  to  be  slower  than  the  other  output. 
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